keon94 commented on code in PR #318:
URL:
https://github.com/apache/incubator-devlake-website/pull/318#discussion_r1042573905
##########
docs/DeveloperManuals/PluginImplementation.md:
##########
@@ -440,90 +443,96 @@ func (plugin GithubSinger) PrepareTaskData(taskCtx
core.TaskContext, options map
return nil, errors.Default.Wrap(err, "unable to get GithubSinger
connection by the given connection ID")
}
endpoint := strings.TrimSuffix(connection.Endpoint, "/")
- config := &models.GithubConfig{
- AccessToken: connection.Token,
- Repository: options["repo"].(string),
- StartDate: options["start_date"].(time.Time),
- RequestTimeout: 300,
- BaseUrl: endpoint,
- }
- op.TapProvider = func() (tap.Tap, errors.Error) {
- return helper.NewSingerTapClient(&helper.SingerTapArgs{
- Config: config,
- TapClass: "TAP_GITHUB",
- StreamPropertiesFile: "github.json",
- })
+ tapClient, err := tap.NewSingerTap(&tap.SingerTapConfig{
+ TapExecutable: "tap-github",
+ StreamPropertiesFile: "github_keon.json",
+ IsLegacy: true,
+ })
+ if err != nil {
+ return nil, err
}
return &tasks.GithubSingerTaskData{
- Options: op,
- Config: config,
+ Options: op,
+ TapClient: tapClient,
+ TapConfig: &models.GithubConfig{
+ AccessToken: connection.Token,
+ Repository: options["repo"].(string),
+ StartDate: options["start_date"].(time.Time),
+ RequestTimeout: 300,
+ BaseUrl: endpoint,
+ },
}, nil
}
```
-Note that the TapClass variable here was set to `"TAP_GITHUB"`. In general,
this will be the name of the environment variable that expands to the full path
of the tap executable.
-The `StreamPropertiesFile` is the name of the properties file of interest, and
is expected to reside in the directory referenced by the environment variable
`"SINGER_PROPERTIES_DIR"`. This directory is
-expected to be shared for all these JSON files. In our example, this directory
is `<devlake-root>/config/singer`.
+Note that the TapExecutable variable here was set to `"tap-github"`, which is
the name of the python executable for the tap.
+The `StreamPropertiesFile` is the name of the properties file of interest, and
is expected to reside in the directory referenced by the environment variable
`"TAP_PROPERTIES_DIR"`. This directory is
+expected to be shared for all these JSON files. In our example, this directory
is `<devlake-root>/config/tap`.
Furthermore, observe how we created the `GithubConfig` object: The raw options
needed two variables "repo" and "start_date", and the remaining fields were
derivable from the connection instance.
These details will vary from tap to tap, but the gist will be the same.
-
-*3.4*. Generate the datamodels corresponding to the JSON schemas of the
streams of interest. We have a custom script that gets this job done. See
`singer/singer-model-generator.sh`. For our example, if we care about
-writing an extractor for GitHub Issues, we'll have to refer to the
properties.json (or github.json) file to identify the stream name associated
with it. In this case, it is called "issues". Next, we run the following
-command: ```sh ./scripts/singer-model-generator.sh
"./config/singer/github.json" "issues" "./plugins/github_singer"```. (Make sure
the script has execution permissions - ```sh chmod +x
./scripts/singer-model-generator.sh```.
-
-This will generate Go (raw) datamodels for "issues" and place them under
`github_singer/models/generated`. Do not modify these files manually.
-
-*3.4.1*. Note: Occasionally, the tap properties will not expose all the
supported fields in the JSON schema - you can go and manually add them there in
the JSON file. Additionally, you might run into type-problems (for instance IDs
coming back as strings but declared as integers). In general, these would be
rare scenarios, and technically bugs for the tap that you would experimentally
run into while testing.
-Either way, if you need to modify these data-types, do it in the JSON file.
-
-*3.5*. Since this is a Singer plugin, we won't need collectors. Remove any
previously generated collector.go files under `github_singer/tasks`, and modify
(actually rewrite) the extractors. Here's what the extractor for
-GitHub issues would look like:
+*3.4*. Since this is a Singer plugin, the collector will have to be modified
to look like this:
```go
package tasks
import (
"github.com/apache/incubator-devlake/errors"
+ "github.com/apache/incubator-devlake/helpers/pluginhelper/tap"
"github.com/apache/incubator-devlake/plugins/core"
-
"github.com/apache/incubator-devlake/plugins/github_singer/models/generated"
"github.com/apache/incubator-devlake/plugins/helper"
)
-var _ core.SubTaskEntryPoint = ExtractIssues
+var _ core.SubTaskEntryPoint = CollectIssues
-func ExtractIssues(taskCtx core.SubTaskContext) errors.Error {
+func CollectIssues(taskCtx core.SubTaskContext) errors.Error {
data := taskCtx.GetData().(*GithubSingerTaskData)
- extractor, err := helper.NewTapExtractor(
- &helper.TapExtractorArgs[generated.Issues]{
- Ctx: taskCtx,
- TapProvider: data.Options.TapProvider,
- ConnectionId: data.Options.ConnectionId,
- Extract: func(resData *generated.Issues)
([]interface{}, errors.Error) {
- // TODO decode some db models from api result
- return nil, nil
+ collector, err := tap.NewTapCollector(
+ &tap.CollectorArgs[tap.SingerTapStream]{
+ RawDataSubTaskArgs: helper.RawDataSubTaskArgs{
+ Ctx: taskCtx,
+ Table: "singer_github_issue",
+ Params: GithubApiParams{
+ Repo: data.TapConfig.Repository,
+ Owner: data.Options.Owner,
+ ConnectionId: data.Options.ConnectionId,
+ },
},
- StreamName: "issues",
+ TapClient: data.TapClient,
+ TapConfig: data.TapConfig,
+ ConnectionId: data.Options.ConnectionId,
+ StreamName: "issues",
},
)
if err != nil {
return err
}
- return extractor.Execute()
+ return collector.Execute()
}
-var ExtractIssuesMeta = core.SubTaskMeta{
- Name: "ExtractIssues",
- EntryPoint: ExtractIssues,
+var CollectIssuesMeta = core.SubTaskMeta{
+ Name: "CollectIssues",
+ EntryPoint: CollectIssues,
EnabledByDefault: true,
- Description: "Extract singer-tap Github issues",
+ Description: "Collect singer-tap Github issues",
}
```
-The `Extract` function is where you write the "normalization" logic that
transforms the raw datatypes to the "tools" ones. Note that framework uses
Generics to simplify some boilerplate. The
-Generic type for this example is ```generated.Issues``` which the generator
script from step 4 produced. This function, just like for conventional plugins,
should return tools-normalized type(s).
-The `StreamName` variable is self-explanatory: the stream name according to
the properties JSON.
-*3.6*. The remaining steps are just like what you would do for conventional
plugins (e.g. the REST APIs, migrations, etc).
+
+*3.5*. Generate the datamodels corresponding to the JSON schemas of the
streams of interest. These make life easy at the Extractor stage as we will not
need to write "Response" structs by hand.
Review Comment:
done
##########
docs/DeveloperManuals/PluginImplementation.md:
##########
@@ -440,90 +443,96 @@ func (plugin GithubSinger) PrepareTaskData(taskCtx
core.TaskContext, options map
return nil, errors.Default.Wrap(err, "unable to get GithubSinger
connection by the given connection ID")
}
endpoint := strings.TrimSuffix(connection.Endpoint, "/")
- config := &models.GithubConfig{
- AccessToken: connection.Token,
- Repository: options["repo"].(string),
- StartDate: options["start_date"].(time.Time),
- RequestTimeout: 300,
- BaseUrl: endpoint,
- }
- op.TapProvider = func() (tap.Tap, errors.Error) {
- return helper.NewSingerTapClient(&helper.SingerTapArgs{
- Config: config,
- TapClass: "TAP_GITHUB",
- StreamPropertiesFile: "github.json",
- })
+ tapClient, err := tap.NewSingerTap(&tap.SingerTapConfig{
+ TapExecutable: "tap-github",
+ StreamPropertiesFile: "github_keon.json",
+ IsLegacy: true,
+ })
+ if err != nil {
+ return nil, err
}
return &tasks.GithubSingerTaskData{
- Options: op,
- Config: config,
+ Options: op,
+ TapClient: tapClient,
+ TapConfig: &models.GithubConfig{
+ AccessToken: connection.Token,
+ Repository: options["repo"].(string),
+ StartDate: options["start_date"].(time.Time),
+ RequestTimeout: 300,
+ BaseUrl: endpoint,
+ },
}, nil
}
```
-Note that the TapClass variable here was set to `"TAP_GITHUB"`. In general,
this will be the name of the environment variable that expands to the full path
of the tap executable.
-The `StreamPropertiesFile` is the name of the properties file of interest, and
is expected to reside in the directory referenced by the environment variable
`"SINGER_PROPERTIES_DIR"`. This directory is
-expected to be shared for all these JSON files. In our example, this directory
is `<devlake-root>/config/singer`.
+Note that the TapExecutable variable here was set to `"tap-github"`, which is
the name of the python executable for the tap.
+The `StreamPropertiesFile` is the name of the properties file of interest, and
is expected to reside in the directory referenced by the environment variable
`"TAP_PROPERTIES_DIR"`. This directory is
+expected to be shared for all these JSON files. In our example, this directory
is `<devlake-root>/config/tap`.
Furthermore, observe how we created the `GithubConfig` object: The raw options
needed two variables "repo" and "start_date", and the remaining fields were
derivable from the connection instance.
These details will vary from tap to tap, but the gist will be the same.
-
-*3.4*. Generate the datamodels corresponding to the JSON schemas of the
streams of interest. We have a custom script that gets this job done. See
`singer/singer-model-generator.sh`. For our example, if we care about
-writing an extractor for GitHub Issues, we'll have to refer to the
properties.json (or github.json) file to identify the stream name associated
with it. In this case, it is called "issues". Next, we run the following
-command: ```sh ./scripts/singer-model-generator.sh
"./config/singer/github.json" "issues" "./plugins/github_singer"```. (Make sure
the script has execution permissions - ```sh chmod +x
./scripts/singer-model-generator.sh```.
-
-This will generate Go (raw) datamodels for "issues" and place them under
`github_singer/models/generated`. Do not modify these files manually.
-
-*3.4.1*. Note: Occasionally, the tap properties will not expose all the
supported fields in the JSON schema - you can go and manually add them there in
the JSON file. Additionally, you might run into type-problems (for instance IDs
coming back as strings but declared as integers). In general, these would be
rare scenarios, and technically bugs for the tap that you would experimentally
run into while testing.
-Either way, if you need to modify these data-types, do it in the JSON file.
-
-*3.5*. Since this is a Singer plugin, we won't need collectors. Remove any
previously generated collector.go files under `github_singer/tasks`, and modify
(actually rewrite) the extractors. Here's what the extractor for
-GitHub issues would look like:
+*3.4*. Since this is a Singer plugin, the collector will have to be modified
to look like this:
```go
package tasks
import (
"github.com/apache/incubator-devlake/errors"
+ "github.com/apache/incubator-devlake/helpers/pluginhelper/tap"
"github.com/apache/incubator-devlake/plugins/core"
-
"github.com/apache/incubator-devlake/plugins/github_singer/models/generated"
"github.com/apache/incubator-devlake/plugins/helper"
)
-var _ core.SubTaskEntryPoint = ExtractIssues
+var _ core.SubTaskEntryPoint = CollectIssues
-func ExtractIssues(taskCtx core.SubTaskContext) errors.Error {
+func CollectIssues(taskCtx core.SubTaskContext) errors.Error {
data := taskCtx.GetData().(*GithubSingerTaskData)
- extractor, err := helper.NewTapExtractor(
- &helper.TapExtractorArgs[generated.Issues]{
- Ctx: taskCtx,
- TapProvider: data.Options.TapProvider,
- ConnectionId: data.Options.ConnectionId,
- Extract: func(resData *generated.Issues)
([]interface{}, errors.Error) {
- // TODO decode some db models from api result
- return nil, nil
+ collector, err := tap.NewTapCollector(
+ &tap.CollectorArgs[tap.SingerTapStream]{
+ RawDataSubTaskArgs: helper.RawDataSubTaskArgs{
+ Ctx: taskCtx,
+ Table: "singer_github_issue",
+ Params: GithubApiParams{
+ Repo: data.TapConfig.Repository,
+ Owner: data.Options.Owner,
+ ConnectionId: data.Options.ConnectionId,
+ },
},
- StreamName: "issues",
+ TapClient: data.TapClient,
+ TapConfig: data.TapConfig,
+ ConnectionId: data.Options.ConnectionId,
+ StreamName: "issues",
},
)
if err != nil {
return err
}
- return extractor.Execute()
+ return collector.Execute()
}
-var ExtractIssuesMeta = core.SubTaskMeta{
- Name: "ExtractIssues",
- EntryPoint: ExtractIssues,
+var CollectIssuesMeta = core.SubTaskMeta{
+ Name: "CollectIssues",
+ EntryPoint: CollectIssues,
EnabledByDefault: true,
- Description: "Extract singer-tap Github issues",
+ Description: "Collect singer-tap Github issues",
}
```
-The `Extract` function is where you write the "normalization" logic that
transforms the raw datatypes to the "tools" ones. Note that framework uses
Generics to simplify some boilerplate. The
-Generic type for this example is ```generated.Issues``` which the generator
script from step 4 produced. This function, just like for conventional plugins,
should return tools-normalized type(s).
-The `StreamName` variable is self-explanatory: the stream name according to
the properties JSON.
-*3.6*. The remaining steps are just like what you would do for conventional
plugins (e.g. the REST APIs, migrations, etc).
+
+*3.5*. Generate the datamodels corresponding to the JSON schemas of the
streams of interest. These make life easy at the Extractor stage as we will not
need to write "Response" structs by hand.
+We have a custom script that gets this job done. See
`scripts/singer-model-generator.sh`. For our example, if we care about
+writing an extractor for GitHub Issues, we'll have to refer to the
properties.json (or github.json) file to identify the stream name associated
with it. In this case, it is called "issues". Next, we run the following
+command: ```sh ./scripts/singer-model-generator.sh "./config/tap/github.json"
"./plugins/github_singer" "issues"```. (Make sure the script has execution
permissions - ```sh chmod +x ./scripts/singer-model-generator.sh```.
+For the sake of convenience, the script supports an `--all` flag in place of
the stream. This will generate source files for all stream. Also, see the
`tap-models` target in the Makefile for references, and add your invocations
+there.
+
+This will generate Go (raw) datamodels and place them under
`github_singer/models/generated`. Do not modify these files manually.
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]