[
https://issues.apache.org/jira/browse/ATLAS-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746275#comment-14746275
]
Venkatesh Seetharam commented on ATLAS-58:
------------------------------------------
Sorry to chime in late.
bq. The server handles de-duping of entities based on the unique attribute of
the entity
Isn't it cheaper to do it on the client and also that client should have the
most context?
bq. 1. Concept of service that are started and stopped at atlas start and stop
Can you please elaborate?
bq. 2. De-duping of entities on server based on any unique attribute for the
entity. If entity doesn't have any unique attribute, de-duping is not done and
new entity is created
Well, can we have an extensible strategy interface, uniq would be default,
fallback could be concat of all fields? or something else that could be
plugged-in?
bq. 3. Changed entity submit API to take list of entities instead of just 1
entity (required for hive hook) - backward incompatible
bq. 4. Moved submit and list from EntityResource to EntitiesResource - backward
incompatible
CC [~arpitgupta]
bq. Sending notification is done synchronously. So, this adds to hive command
execution delay. But this also makes it reliable
IMHO, this is a huge red flag. We may want to stick to what ATS hook does -
intact extend the code there with the shutdown hook and send it reliably with
out affecting latency.
> Make hive hook reliable
> -----------------------
>
> Key: ATLAS-58
> URL: https://issues.apache.org/jira/browse/ATLAS-58
> Project: Atlas
> Issue Type: Sub-task
> Reporter: Shwetha G S
> Assignee: Shwetha G S
> Labels: incompatible
> Fix For: trunk
>
> Attachments: ATLAS-58-v2.patch, ATLAS-58.patch
>
>
> Currently, hive hook executes in background thread pool and is an best effort
> approach to register entities. But this needs to be reliable for data
> governance to be effective
> One way is - in hive hook, add the entities to some messaging framework and
> atlas server can read the entities from the message and register in atlas.
> Since, posting message is faster, we can do it synchronously and hence
> reliable entity registration.
> We can start with kafka for messaging, but any other messaging framework
> should be pluggable
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)