[ 
https://issues.apache.org/jira/browse/ATLAS-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746275#comment-14746275
 ] 

Venkatesh Seetharam commented on ATLAS-58:
------------------------------------------

Sorry to chime in late.

bq. The server handles de-duping of entities based on the unique attribute of 
the entity
Isn't it cheaper to do it on the client and also that client should have the 
most context?

bq. 1. Concept of service that are started and stopped at atlas start and stop
Can you please elaborate?

bq. 2. De-duping of entities on server based on any unique attribute for the 
entity. If entity doesn't have any unique attribute, de-duping is not done and 
new entity is created
Well, can we have an extensible strategy interface, uniq would be default, 
fallback could be concat of all fields? or something else that could be 
plugged-in?

bq. 3. Changed entity submit API to take list of entities instead of just 1 
entity (required for hive hook) - backward incompatible
bq. 4. Moved submit and list from EntityResource to EntitiesResource - backward 
incompatible 
CC [~arpitgupta]

bq. Sending notification is done synchronously. So, this adds to hive command 
execution delay. But this also makes it reliable
IMHO, this is a huge red flag. We may want to stick to what ATS hook does - 
intact extend the code there with the shutdown hook and send it reliably with 
out affecting latency.


> Make hive hook reliable
> -----------------------
>
>                 Key: ATLAS-58
>                 URL: https://issues.apache.org/jira/browse/ATLAS-58
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Shwetha G S
>            Assignee: Shwetha G S
>              Labels: incompatible
>             Fix For: trunk
>
>         Attachments: ATLAS-58-v2.patch, ATLAS-58.patch
>
>
> Currently, hive hook executes in background thread pool and is an best effort 
> approach to register entities. But this needs to be reliable for data 
> governance to be effective
> One way is - in hive hook, add the entities to some messaging framework and 
> atlas server can read the entities from the message and register in atlas. 
> Since, posting message is faster, we can do it synchronously and hence 
> reliable entity registration.
> We can start with kafka for messaging, but any other messaging framework 
> should be pluggable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to