jerryshao commented on code in PR #6946:
URL: https://github.com/apache/gravitino/pull/6946#discussion_r2051913449
##########
docs/lineage/gravitino-server-lineage.md:
##########
@@ -0,0 +1,60 @@
+---
+title: "Gravitino server Lineage support"
+slug: /lineage/gravitino-server-lineage
+keyword: Gravitino OpenLineage
+license: "This software is licensed under the Apache License version 2."
+---
+
+## Overview
+
+Gravitino server provides a pluginable lineage framework to receive, process,
and sink OpenLineage events. By leveraging this, you could do custom process
for the lineage event and sink to your dedicated systems.
+
+## Lineage Configuration
+
+| Configuration item | Description
| Default value
| Required | Since Version |
+|-----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|----------|------------------|
+| `gravitino.lineage.source` | The name of lineage event
source.
| http
| No | 0.9.0-incubating |
+| `gravitino.lineage.${sourceName}.sourceClass` | The name of the lineage
source class which should implement
`org.apache.gravitino.lineage.source.LineageSource` interface.
| (none)
| No | 0.9.0-incubating |
+| `gravitino.lineage.processorClass` | The name of the lineage
processor class which should implement
`org.apache.gravitino.lineage.processor.LineageProcessor` interface. The
default noop processor will do nothing about the run event.
|
`org.apache.gravitino.lineage.processor.NoopProcessor` | No |
0.9.0-incubating |
+| `gravitino.lineage.sinks` | The Lineage event sink names
(support multiple sinks separated by commas).
| log
| No | 0.9.0-incubating |
+| `gravitino.lineage.${sinkName}.sinkClass` | The name of the lineage sink
class which should implement `org.apache.gravitino.lineage.sink.LineageSink`
interface.
| (none)
| No | 0.9.0-incubating |
+| `gravitino.lineage.queueCapacity` | The total capacity of
lineage event queues. When there are multiple lineage sinks, each sink utilizes
an isolated event queue. The capacity of each queue is calculated by dividing
the value of `gravitino.lineage.queueCapacity` by the number of sinks. | 10000
| No | 0.9.0-incubating |
+
+## Lineage http source
+
+Http source provides an endpoint which follows [OpenLineage API
spec](https://openlineage.io/apidocs/openapi/) to receive OpenLineage run
event. The following use example:
+
+```shell
+cat <<EOF >source.json
+{
+ "eventType": "START",
+ "eventTime": "2023-10-28T19:52:00.001+10:00",
+ "run": {
+ "runId": "0176a8c2-fe01-7439-87e6-56a1a1b4029f"
+ },
+ "job": {
+ "namespace": "gravitino-namespace",
+ "name": "gravitino-job1"
+ },
+ "inputs": [{
+ "namespace": "gravitino-namespace",
+ "name": "gravitino-table-identifier"
+ }],
+ "producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
+ "schemaURL":
"https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
+}
+EOF
+
+curl -X POST \
+ -i -H 'Content-Type: application/json' \
+ -d '@source.json' \
+ http://localhost:8090/api/lineage
+```
+
+## Lineage log sink
+
+Log sink will print the log in a separate log file `gravitino_lineage.log`,
you could change the default behavior in `conf/log4j2.properties`.
+
+## High watermark status
+
+When the lineage sink operates slowly, lineage events accumulate in the async
queue. Once the queue size exceeds 90% of its capacity (high watermark
threshold), the lineage system enters a high watermark status. In this state,
the lineage source must implement retry and logging mechanisms for rejected
events to prevent system overload. For the HTTP source, it will return the `429
Too Many Requests` status code to the client.
Review Comment:
Please do not use future tense "will" for the doc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]