[
https://issues.apache.org/jira/browse/KUDU-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930675#comment-16930675
]
Grant Henke commented on KUDU-2895:
-----------------------------------
Upon further investigation it looks like we should avoid the generic lineage
file as the only way to report a lineage/audit event. Instead we could offer it
as a default option, but build the feature with direct to Atlas client usage in
mind.
To do this we should leverage a Java subprocess service. This can be used by
Ranger too.
Doing that makes this integration fairly straightforward. In all the places we
do an authorization check, we can immediately call a "logAuditEvent" function
right after to forward the information to the audit functionality. Then based
on the Kudu/Server configuration it can either log to a file or forward to an
Atlas integration.
We can use the other Atlas integrations and models to help define our
integration in the Atlas plugin. This will likely be very similar/derived from
the Ranger model we define in the Ranger work.
* https://github.com/apache/atlas/tree/master/addons/models/1000-Hadoop
*
https://github.com/apache/atlas/tree/master/addons/impala-bridge/src/main/java/org/apache/atlas/impala
> Native Apache Atlas Support
> ---------------------------
>
> Key: KUDU-2895
> URL: https://issues.apache.org/jira/browse/KUDU-2895
> Project: Kudu
> Issue Type: New Feature
> Reporter: Grant Henke
> Priority: Major
> Labels: roadmap-candidate
>
> This tracks adding lineage support to Kudu and Apache Atlas.
> A few notes based on some initial research:
> * It probably makes sense to generate a generic lineage file which can be
> consumed by Apache Atlas for lineage.
> ** This avoids the need for Java interaction in the server
> ** This is the approach Impala uses
> ** See ATLAS-3183 and
> [https://impala.apache.org/docs/build3x/html/topics/impala_lineage.html#lineage]
> * Creating lineage entries for table "DDL" initially makes sense
> ** CREATE, ALTER, DELETE
> ** This is what Hbase seems to do: [https://atlas.apache.org/Hook-HBase.html]
> ** "Only the namespace, table and column-family create/update/ delete
> operations are captured by Atlas HBase hook"
> * The need for lineage information by scans in unclear
> ** It would be super fine grained and difficult to interpret.
> ** Instead lineage from other tools doing the scanning would be more
> interpretable (Impala, Spark, etc).
--
This message was sent by Atlassian Jira
(v8.3.2#803003)