Radhika Kundam created ATLAS-5320:
-------------------------------------

             Summary: Distributed Notification Processing
                 Key: ATLAS-5320
                 URL: https://issues.apache.org/jira/browse/ATLAS-5320
             Project: Atlas
          Issue Type: New Feature
            Reporter: Radhika Kundam
            Assignee: Radhika Kundam


Current entity and lineage processing in Atlas is mostly single-threaded to 
maintain message order, which limits scalability. HMS messages create entities, 
and HS2 messages create lineage based on those entities. To ensure correct 
lineage, we currently serialize all processing, which becomes a performance 
bottleneck.

As a solution for this, Introduce a proof-of-concept for scalable message 
processing in Apache Atlas by using multiple Kafka topics based on a key 
({{{}dbName.tableName{}}}). This will enable parallel processing of HMS and HS2 
messages, improve throughput, and reduce bottlenecks caused by single-threaded 
lineage creation.

Implementation details:
 * HMS messages are routed to Kafka partitions based on 
{{{}dbName.tableName{}}}.

 * HS2 messages are routed to *all* relevant partitions based on input/output 
tables.

 * Messages are processed in parallel by multiple consumer threads.

 * Deduplication and shell entity handling is incorporated.

Attached architectural design document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to