shnapz commented on issue #33981:
URL: https://github.com/apache/beam/issues/33981#issuecomment-3700722605

   > To submit lineage data in OpenLineage format, you need to know the 
source-sink pairs.
   
   True, but it is all simpler if you don't need a transform-level granularity, 
but a job-level granularity (consider your job as an atomic transform that has 
all sources and sinks at the same time): class 
[org/apache/beam/sdk/metrics/Lineage.java](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/Lineage.java)
 is a central interception point across all IOs. So my PR just modifies this 
class to **add ability** to substitute lineage metrics with any custom 
implementation (sacrificing transform-level granularity). Apparently it doesn't 
overlap with your ticket at all. So we are good!
   
   Indeed metrics are critical for the transform level, and also metrics are 
useful for cross-worker deduplication (implementation is strictly runner 
specific though). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to