This is an automated email from the ASF dual-hosted git repository.
fanng pushed a commit to branch lineage_doc
in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/lineage_doc by this push:
new e7c29d28a4 update doc
e7c29d28a4 is described below
commit e7c29d28a4020a266a8ff3fb348d1b240feb7383
Author: fanng <[email protected]>
AuthorDate: Wed Apr 16 11:11:56 2025 +0800
update doc
---
docs/lineage/gravitino-server-lineage.md | 7 +++----
docs/lineage/gravitino-spark-lineage.md | 6 ++----
2 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/docs/lineage/gravitino-server-lineage.md
b/docs/lineage/gravitino-server-lineage.md
index 0cb40179bc..09f7b8fc75 100644
--- a/docs/lineage/gravitino-server-lineage.md
+++ b/docs/lineage/gravitino-server-lineage.md
@@ -13,10 +13,10 @@ Gravitino server provides a pluginable lineage framework to
receive, process, an
| Configuration item | Description
| Default value | Required |
Since Version |
|-----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|----------|---------------|
-| `gravitino.lineage.source` | The name of lineage event
source. The default `http` event source will .
| http | No
| 0.9.0 |
+| `gravitino.lineage.source` | The name of lineage event
source.
| http | No
| 0.9.0 |
| `gravitino.lineage.${sourceName}.sourceClass` | The name of the lineage
source class which should implement
`org.apache.gravitino.lineage.source.LineageSource` interface.
| (none)
| No | 0.9.0 |
| `gravitino.lineage.processorClass` | The name of the lineage
processor class which should implement
`org.apache.gravitino.lineage.processor.LineageProcessor` interface. The
default noop processor will do nothing about the run event. |
`org.apache.gravitino.lineage.processor.NoopProcessor` | No | 0.9.0
|
-| `gravitino.lineage.sinks` | The name of lineage event
sinks.
| log | No
| 0.9.0 |
+| `gravitino.lineage.sinks` | The Lineage event sink names
(support multiple sinks separated by commas).
| log | No |
0.9.0 |
| `gravitino.lineage.${sinkName}.sinkClass` | The name of the lineage sink
class which should implement `org.apache.gravitino.lineage.sink.LineageSink`
interface.
| (none) | No
| 0.9.0 |
| `gravitino.lineage.queueCapacity` | The total capacity of
lineage event queues. If there are multi lineage sinks, the sinks will use an
isolated event queue with the capacity of `gravitino.lineage.queueCapacity` div
the num of sinks. | 10000 | No
| 0.9.0 |
@@ -52,5 +52,4 @@ Log sink will print the log in a separate log file
`gravitino_lineage.log`, you
## High watermark status
-If the lineage sink is slow, the lineage event will heap in the async queue,
the lineage system will enter high watermark status if the queue size is larger
than the capability*0.9. In high watermark status, the lineage source should
implement appropriate retry/logging mechanisms for rejected events to prevent
-system overload. For `http` source, it will return http status code `429` to
the client side.
\ No newline at end of file
+When the lineage sink operates slowly, lineage events accumulate in the async
queue. Once the queue size exceeds 90% of its capacity (high watermark
threshold), the lineage system enters a high watermark status. In this state,
the lineage source must implement retry and logging mechanisms for rejected
events to prevent system overload. For the HTTP source, it will return the `429
Too Many Requests` status code to the client.
\ No newline at end of file
diff --git a/docs/lineage/gravitino-spark-lineage.md
b/docs/lineage/gravitino-spark-lineage.md
index a2c539f0bb..d26811585e 100644
--- a/docs/lineage/gravitino-spark-lineage.md
+++ b/docs/lineage/gravitino-spark-lineage.md
@@ -20,10 +20,8 @@ By leveraging OpenLineage Spark plugin, Gravitino provides a
separate Spark plug
The Gravitino OpenLineage Spark plugin transforms the Gravitino metalake name
into the dataset namespace. The dataset name varies by dataset type when
generating lineage information.
-If you are using to access the table managed by Gravitino, the dataset name is
as follows:
When using the [Gravitino Spark
connector](/spark-connector/spark-connector.md) to access tables managed by
Gravitino, the dataset name follows this format:
-
| Dataset Type | Dataset name | Example
| Since Version |
|-----------------|------------------------------------------------|----------------------------|---------------|
| Hive catalog | `$GravitinoCatalogName.$schemaName.$tableName` |
`hive_catalog.db.student` | 0.9.0 |
@@ -47,7 +45,7 @@ When accessing datasets by location (e.g., `SELECT * FROM
parquet.$dataset_path`
| GVFS location | `$GravitinoCatalogName.$schemaName.$filesetName` |
`fileset_catalog.schema.fileset_a` | 0.9.0 |
| Other location | location path |
`hdfs://127.0.0.1:9000/tmp/a/student` | 0.9.0 |
-For fileset dataset, the plugin add `fileset-location` facets which contains
the location path.
+For GVFS location, the plugin add `fileset-location` facets which contains the
location path.
```json
"fileset-location" :
@@ -61,7 +59,7 @@ For fileset dataset, the plugin add `fileset-location` facets
which contains the
## How to use
1. Download Gravitino OpenLineage plugin jar and place it to the classpath of
Spark.
-2. Add configuration to the Spark to enable lineage collect.
+2. Add configuration to the Spark to enable lineage collection.
Configuration example For Spark shell: