davidzollo commented on PR #10447: URL: https://github.com/apache/seatunnel/pull/10447#issuecomment-3863658151
Thank you for the contribution! This is a good starting point for the HubSpot connector. I found several critical issues that need to be addressed before this can be merged. **1. Invalid Default `content_field` causing Runtime Failure** In `HubSpotSourceParameter.java`, the default `content_field` is hardcoded as `"results"`. The strict JSON path implementation in `HttpSourceReader` expects a valid JsonPath expression (e.g., usually starting with `$.`). Passing `"results"` will likely cause a `JsonPathCompile` exception or fail to extract data correctly at runtime. **Action**: Change the default value to `$.results` (to return the array) or `$.results.*`? **2. Mismatch between Default Schema and E2E Test Expectations** - `HubSpotSource` currently reuses `HttpSource` without injecting a default schema or `json_field` configuration. This means, by default, it produces a single column named `content` (containing the raw JSON string). - However, your E2E test (`hubspot_source_case.conf`) asserts the presence of an `id` field (`field_name = "id"`). - **Result**: The test validates a scenario that doesn't match the actual default behavior of the connector. **Action**: Either (A) Configure default `json_field` mappings in `HubSpotSource` so fields like `id` are extracted automatically, OR (B) Update the E2E test config to explicitly define the `schema` and `json_field` mappings required to parse the `id`. **3. E2E Test Network Configuration is Incorrect** The `HubSpotIT.java` sets up a `MockServer` container but fails to place it in the shared network used by the SeaTunnel engine container. - Missing `.withNetwork(NETWORK)` and `.withNetworkAliases(...)` in the container setup. - The configuration `url = "http://mock-server"` in the conf file cannot be resolved by the SeaTunnel container because the network alias is invalid. **Action**: Please copy the networking setup pattern from `HttpIT.java` (using `TestSuiteBase.NETWORK` and setting an alias) to ensure the test actually hits the mock server. **4. `OptionRule` Validation Inconsistency** In `HubSpotSourceFactory`, the `optionRule` only declares `ACCESS_TOKEN` and `OBJECT_TYPE`. However, your test case tries to inject a `url` parameter (`url = "http://mock-server"`). - If strict validation is enabled in the future, this will fail. - You are also manually reading `"url"` in `HubSpotSourceParameter` using a disjointed `Options.key("url")` instead of reusing `HttpCommonOptions.URL`. **Action**: If you intend to allow users/tests to override `url`, please add `HttpCommonOptions.URL` to the `OptionRule` in the factory and reuse that constant in `HubSpotSourceParameter`. **5. Dependency Scope in `seatunnel-dist`** In `seatunnel-dist/pom.xml`, the new `connector-http-hubspot` dependency is missing `<scope>provided</scope>`, unlike other HTTP connectors. **Action**: Please add `<scope>provided</scope>` to align with the project's packaging strategy and avoid unnecessary artifact bloat. **6. Documentation Accuracy** In `Hubspot.md`, the "Key features" section marks `parallelism` and `exactly-once` as supported (`[x]`). Since this connector extends `HttpSource` (which is typically single-split/bounded for batch), please verify if these claims are strictly true or just inherited boilerplate. It is better to be accurate than to overpromise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
