-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73081/
-----------------------------------------------------------
(Updated Dec. 11, 2020, 9:47 p.m.)
Review request for atlas, Ashutosh Mestry, Madhan Neethiraj, Nikhil Bonte, and
Sarath Subramanian.
Bugs: ATLAS-4076
https://issues.apache.org/jira/browse/ATLAS-4076
Repository: atlas
Description (updated)
-------
Observations:
=============
Have a hive table and attach classification to it on Atlas. Enable propagation
on the attached classification.
When you drive a new table from this hive table, the new table will have the
propagated classification, as expected.
However, the entity audits of the newly derived table has multiple "Propagated
Classification Added" enteries.
If table derivation is done using Hive Beeline, there are 5 such entries per
propagated classification.
Using Spark-shell, 3 such entries were observed per propagated classification.
Expected behaviour is to have just 1 entry per propagated classification.
Analysis:
=========
After detecting relationship and creating relationship edge, the propagated
enteties(classifications) are notified to entityChangeListner through
entityChangeNotifier. However details of the propagated enteties are not passed
directly to notifier, but through request context (buffered into
addedPropagation list).
After processing every edge, AtlasRelationshipStore manager sends notification
to entityChangeListner, which simply gets all the items in request context
buffer list.
In this issue, Hive sends event which has multiple relationships, and only one
relationship has propagated entities, but due to multiple notifications(which
is correct) same buffer list is processed multipletimes (which is wrong).
Following are the list of created relationships
Created relationship edge from [hive_table] --> [hive_storagedesc] using edge
label: [__hive_table.sd]
Created relationship edge from [hive_table] --> [hive_column] using edge label:
[__hive_table.columns]
Created relationship edge from [hive_table] --> [hive_table_ddl] using edge
label: [r:hive_table_ddl_queries]
Created relationship edge from [hive_table] --> [hive_db] using edge label:
[__hive_table.db]
Created relationship edge from [hive_process] --> [hive_process_execution]
using edge label: [r:hive_process_process_executions]
Created relationship edge from [hive_process] --> [hive_table] using edge
label: [__Process.outputs]
Created relationship edge from [hive_process] --> [hive_table] using edge
label: [__Process.inputs]
===================================================================================================
Created relationship edge from [hive_column_lineage] --> [hive_column] using
edge label: [__Process.outputs]
Created relationship edge from [hive_column_lineage] --> [hive_column] using
edge label: [__Process.inputs]
Created relationship edge from [hive_column_lineage] --> [hive_process] using
edge label: [__hive_column_lineage.query]
In the above list the highlited one has propagated classificatin, but
subscequent 3 relationships sends 3 more notifications, resulting 3 extra
entries for same classification in entity audits.
At the end entityChangeNotifier, while processing mutated entities, explicetly
notify for any pending propagated entities and once again buffer list in
request context is processed. Resulting in 4th extra entry in audits.
Fix:
====
One option was to send the details of propagated entities directly to notifier
and not rely on the request context. It required lot of code change.
Other option was to clear the buffer in the request context after processing it
in entityChangeNotifier.
This review request is with the second aproach.
Diffs
-----
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityChangeNotifier.java
32ad65e7a
server-api/src/main/java/org/apache/atlas/RequestContext.java 32ffddde1
Diff: https://reviews.apache.org/r/73081/diff/1/
Testing
-------
Manual testing was done using both hive and spark.
precommit test were success
https://ci-builds.apache.org/job/Atlas/job/PreCommit-ATLAS-Build-Test/263/console
Thanks,
Deep Singh