xc-jianghan commented on issue #4814:
URL: https://github.com/apache/kyuubi/issues/4814#issuecomment-3882915087

   Hi ,
   
   Hello — I’ve worked through the issues and successfully deployed the Kyuubi 
Spark Lineage plugin. However, I noticed a behavior difference compared to 
Atlas’s native Hive hook:
   
   For the same type of recurring INSERT job, the Hive hook seems to update the 
existing entity’s timestamp rather than creating multiple new entities. In 
contrast, the Spark Lineage plugin generates a new spark_process and 
spark_column_lineage entity for every Spark job execution. For 
scheduled/recurring jobs, this means the number of spark_process and 
spark_column_lineage entities keeps growing over time, which doesn’t seem 
reasonable.
   
   Is it possible to adapt the Spark Lineage plugin to behave more like the 
Hive hook (e.g., deduplicate/upsert instead of creating new entities each run)? 
I’d really appreciate any guidance or suggestions on how to approach this.
   
   Thanks a lot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to