chl-wxp commented on issue #10339:
URL: https://github.com/apache/seatunnel/issues/10339#issuecomment-3752933792
## Core Idea
For **semi-structured and unstructured data sources** (such as FTP, S3, file
systems, etc.),
SeaTunnel can resolve table schemas from **Gravitino REST APIs**.
This does **not** replace the existing inline `schema` configuration.
It is a **new optional mechanism** that is **fully backward compatible**.
The priority order is:
### 1 If `schema` Is Defined, Always Use It
If a connector configuration contains a `schema` block, SeaTunnel **must
ignore Gravitino**
and use the inline schema directly.
```hocon
FtpFile {
path = "/tmp/seatunnel/sink/text"
host = "192.168.31.48"
port = 21
user = tyrantlucifer
password = tianchao
file_format_type = "text"
schema = {
name = string
age = int
}
field_delimiter = "#"
}
```
### 2 Using Gravitino via env (Recommended Mode)
SeaTunnel already integrates with Gravitino Metalake at the engine level.
When configured in env, all non-relational sources can reference schemas by
name.
```hocon
env {
metalake_enabled = true
metalake_type = "gravitino"
metalake_url =
"http://localhost:8090/api/metalakes/metalake_name/catalogs/"
}
```
#### 2.1Use schema_path
```hocon
FtpFile {
path = "/tmp/seatunnel/sink/text"
host = "192.168.31.48"
port = 21
user = tyrantlucifer
password = tianchao
file_format_type = "text"
schema_path = "catalog_name.ykw.test_table"
field_delimiter = "#"
}
```
#### 2.2 Use schema_url
```hocon
FtpFile {
path = "/tmp/seatunnel/sink/text"
host = "192.168.31.48"
port = 21
user = tyrantlucifer
password = tianchao
file_format_type = "text"
schema_url =
"http://localhost:8090/api/metalakes/laowang_test/catalogs/221-pgsql/schemas/ykw/tables/all_type"
field_delimiter = "#"
}
```
### 3 Fallback to OS Environment Variables
If Gravitino is not defined in env, SeaTunnel reads from OS environment
variables:
```hocon
metalake_enabled
metalake_type
metalake_url
```
The behavior is identical to the env configuration in Section 2.
### 4 Standalone Gravitino Configuration at Connector Level
If no metadata center is configured globally, the connector can define
Gravitino directly.
#### 4.1 Using schema_url
```hocon
FtpFile {
path = "/tmp/seatunnel/sink/text"
host = "192.168.31.48"
port = 21
user = tyrantlucifer
password = tianchao
file_format_type = "text"
metalake_type = "gravitino"
schema_url =
"http://localhost:8090/api/metalakes/laowang_test/catalogs/221-pgsql/schemas/ykw/tables/all_type"
field_delimiter = "#"
}
```
#### 4.2 Using schema_path
```hocon
FtpFile {
path = "/tmp/seatunnel/sink/text"
host = "192.168.31.48"
port = 21
user = tyrantlucifer
password = tianchao
file_format_type = "text"
metalake_type = "gravitino"
metalake_url =
"http://localhost:8090/api/metalakes/metalake_name/catalogs/"
schema_path = "catalog_name.ykw.test_table"
field_delimiter = "#"
}
```
### 5 Find the http detector of restApi according to metalake_type
### 6 The detector calls the spliced URL to get the responseBody, hands it
to the mapper for type matching, and completes the catalogTable construction.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]