chl-wxp commented on issue #10339:
URL: https://github.com/apache/seatunnel/issues/10339#issuecomment-3755306048
> Thanks for the proposal. This is a valid feature gap: current Metalake
integration only supports placeholder replacement for credentials, not schema
retrieval from Gravitino.
>
> **Evidence:**
>
> *
`seatunnel-api/src/main/java/org/apache/seatunnel/api/metalake/MetalakeConfigUtils.java`
(L76-84) only fetches properties for placeholder substitution
> *
`seatunnel-connectors-v2/connector-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/mongodb/source/MongodbSourceFactory.java`
(L47-52) requires inline `schema` as mandatory
> *
`seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseFileSourceConfig.java`
(L83-89) uses inline schema or falls back to simple text table
>
> **Suggested starting points:**
>
> 1.
`seatunnel-api/src/main/java/org/apache/seatunnel/api/options/table/TableSchemaOptions.java`
— Add `schema_path`, `schema_url` options
> 2.
`seatunnel-api/src/main/java/org/apache/seatunnel/api/metalake/MetalakeClient.java`
— Add `getTableSchema(String tableRef)` method
> 3.
`seatunnel-api/src/main/java/org/apache/seatunnel/api/metalake/GravitinoClient.java`
— Implement REST call to `/tables/{table}`
> 4.
`seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseFileSourceConfig.java`
— Extend schema resolution logic
>
> **Questions to clarify:**
>
> 1. Should non-relational catalogs (File/ES/MongoDB) use new Gravitino
catalog providers or reuse existing `fileset`/`messaging` types?
> 2. How should complex types (ES nested/object, MongoDB documents) be
mapped from Gravitino to SeaTunnel data types?
1. To implement schema detection, try to rely on the existing
GravitinoClient.
2. Gravitino is a new function option, and the schema will not be replaced.
3. Gravitino is not only a data source center, but also plays a more
important role as a metadata center, which unifies data types and can serve as
the only source of truth for data types in non-relational databases. I
mentioned in the design document that after requesting the restApi of the
schema, I will not process it based on the placeholder. The correct approach is
to map the Gravitino type and the seatunnel type. The following is an example
of the request result when I obtain a pgsql table structure:
```json
{
"name": "id",
"type": "integer",
"nullable": true,
"autoIncrement": false,
"defaultValue": {
"type": "literal",
"dataType": "null",
"value": "NULL"
}
},
{
"name": "big_number",
"type": "long",
"nullable": true,
"autoIncrement": false,
"defaultValue": {
"type": "literal",
"dataType": "null",
"value": "NULL"
}
},
{
"name": "small_number",
"type": "integer",
"nullable": true,
"autoIncrement": false,
"defaultValue": {
"type": "literal",
"dataType": "null",
"value": "NULL"
}
},
{
"name": "tiny_number",
"type": "short",
"nullable": true,
"autoIncrement": false,
"defaultValue": {
"type": "literal",
"dataType": "null",
"value": "NULL"
}
}
```
It can be seen that Gravitino actually unifies the types, which are not the
types of the original database, and I have asked the official for confirmation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]