Re: [I] [Question]Getting file info [incubator-devlake]

via GitHub Mon, 12 Aug 2024 14:51:30 -0700


dosubot[bot] commented on issue #7877:
URL: 
https://github.com/apache/incubator-devlake/issues/7877#issuecomment-2284966048


   <!-- Answer -->
   To create and use the `customize` plugin within Apache DevLake, follow these 
steps:
   
   1. **Define the Transformation Rules**: Create a JSON configuration that 
specifies the transformation rules for extracting data from the raw layer 
tables and populating the `commit_files` table.
   
   2. **Trigger Data Extraction**: Use the JSON configuration to trigger the 
data extraction process.
   
   Here is an example JSON configuration to extract file changes from a raw 
layer table (e.g., `_raw_gitlab_api_commit_files`) and populate the 
`commit_files` table:
   
   ```json
   [
     [
       {
         "plugin": "customize",
         "options": {
           "transformationRules": [
             {
               "table": "commit_files",
               "rawDataTable": "_raw_gitlab_api_commit_files",
               "rawDataParams": "{\"ConnectionId\":1,\"ProjectId\":123}",
               "mapping": {
                 "x_file_path": "file_path",
                 "x_commit_sha": "commit_sha",
                 "x_change_type": "change_type"
               }
             }
           ]
         }
       }
     ]
   ]
   ```
   
   In this example:
   - `table`: Specifies the domain layer table name (`commit_files`).
   - `rawDataTable`: Specifies the raw layer table from which data is extracted 
(`_raw_gitlab_api_commit_files`).
   - `rawDataParams`: Specifies the filter to select records from the raw layer 
table.
   - `mapping`: Specifies the extraction rules, where the key is the extension 
field name, and the value is the JSON path to extract the value from the raw 
data.
   
   3. **Trigger Data Extraction via API**: You can trigger the data extraction 
by making a POST request to the `/pipelines` endpoint with the JSON 
configuration.
   
   Here is an example curl command to trigger the data extraction:
   
   ```shell
   curl 'http://localhost:8080/pipelines' \
   --header 'Content-Type: application/json' \
   --data-raw '
   {
       "name": "extract commit files",
       "plan": [
           [
               {
                   "plugin": "customize",
                   "options": {
                       "transformationRules": [
                           {
                               "table": "commit_files",
                               "rawDataTable": "_raw_gitlab_api_commit_files",
                               "rawDataParams": 
"{\"ConnectionId\":1,\"ProjectId\":123}",
                               "mapping": {
                                   "x_file_path": "file_path",
                                   "x_commit_sha": "commit_sha",
                                   "x_change_type": "change_type"
                               }
                           }
                       ]
                   }
               }
           ]
       ]
   }
   '
   ```
   
   This configuration and API request will ensure that the `commit_files` table 
is populated with data from the raw layer tables, capturing file changes during 
the data collection process 
<sup>[[1]](https://github.com/apache/incubator-devlake-website/blob/main/docs/Plugins/customize.md)</sup>.
   
   **Note:** The names of columns added via the `customize` plugin must start 
with the prefix `x_`. The supported data types for the columns are 
`varchar(255)`, `text`, `bigint`, `float`, `timestamp`, and `array`.
   
   
   <!-- Dosu Comment Footer -->
   *To continue talking to [Dosu](https://dosu.dev), mention @dosu.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Question]Getting file info [incubator-devlake]

Reply via email to