chl-wxp commented on issue #10339:
URL: https://github.com/apache/seatunnel/issues/10339#issuecomment-3755306048

   > Thanks for the proposal. This is a valid feature gap: current Metalake 
integration only supports placeholder replacement for credentials, not schema 
retrieval from Gravitino.
   > 
   > **Evidence:**
   > 
   > * 
`seatunnel-api/src/main/java/org/apache/seatunnel/api/metalake/MetalakeConfigUtils.java`
 (L76-84) only fetches properties for placeholder substitution
   > * 
`seatunnel-connectors-v2/connector-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/mongodb/source/MongodbSourceFactory.java`
 (L47-52) requires inline `schema` as mandatory
   > * 
`seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseFileSourceConfig.java`
 (L83-89) uses inline schema or falls back to simple text table
   > 
   > **Suggested starting points:**
   > 
   > 1. 
`seatunnel-api/src/main/java/org/apache/seatunnel/api/options/table/TableSchemaOptions.java`
 — Add `schema_path`, `schema_url` options
   > 2. 
`seatunnel-api/src/main/java/org/apache/seatunnel/api/metalake/MetalakeClient.java`
 — Add `getTableSchema(String tableRef)` method
   > 3. 
`seatunnel-api/src/main/java/org/apache/seatunnel/api/metalake/GravitinoClient.java`
 — Implement REST call to `/tables/{table}`
   > 4. 
`seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseFileSourceConfig.java`
 — Extend schema resolution logic
   > 
   > **Questions to clarify:**
   > 
   > 1. Should non-relational catalogs (File/ES/MongoDB) use new Gravitino 
catalog providers or reuse existing `fileset`/`messaging` types?
   > 2. How should complex types (ES nested/object, MongoDB documents) be 
mapped from Gravitino to SeaTunnel data types?
   
   1. To implement schema detection, try to rely on the existing 
GravitinoClient.
   2. Gravitino is a new function option, and the schema will not be replaced.
   3. Gravitino is not only a data source center, but also plays a more 
important role as a metadata center, which unifies data types and can serve as 
the only source of truth for data types in non-relational databases. I 
mentioned in the design document that after requesting the restApi of the 
schema, I will not process it based on the placeholder. The correct approach is 
to map the Gravitino type and the seatunnel type. The following is an example 
of the request result when I obtain a pgsql table structure:
   ```json
   { 
   "name": "id", 
   "type": "integer", 
   "nullable": true, 
   "autoIncrement": false, 
   "defaultValue": { 
   "type": "literal", 
   "dataType": "null", 
   "value": "NULL" 
   } 
   }, 
   { 
   "name": "big_number", 
   "type": "long", 
   "nullable": true, 
   "autoIncrement": false, 
   "defaultValue": { 
   "type": "literal", 
   "dataType": "null", 
   "value": "NULL" 
   } 
   }, 
   { 
   "name": "small_number", 
   "type": "integer", 
   "nullable": true, 
   "autoIncrement": false, 
   "defaultValue": { 
   "type": "literal", 
   "dataType": "null", 
   "value": "NULL" 
   } 
   }, 
   { 
   "name": "tiny_number", 
   "type": "short", 
   "nullable": true, 
   "autoIncrement": false, 
   "defaultValue": { 
   "type": "literal", 
   "dataType": "null", 
   "value": "NULL" 
   } 
   }
   ```
   It can be seen that Gravitino actually unifies the types, which are not the 
types of the original database, and I have asked the official for confirmation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to