Radhika Kundam created ATLAS-5320:
-------------------------------------
Summary: Distributed Notification Processing
Key: ATLAS-5320
URL: https://issues.apache.org/jira/browse/ATLAS-5320
Project: Atlas
Issue Type: New Feature
Reporter: Radhika Kundam
Assignee: Radhika Kundam
Current entity and lineage processing in Atlas is mostly single-threaded to
maintain message order, which limits scalability. HMS messages create entities,
and HS2 messages create lineage based on those entities. To ensure correct
lineage, we currently serialize all processing, which becomes a performance
bottleneck.
As a solution for this, Introduce a proof-of-concept for scalable message
processing in Apache Atlas by using multiple Kafka topics based on a key
({{{}dbName.tableName{}}}). This will enable parallel processing of HMS and HS2
messages, improve throughput, and reduce bottlenecks caused by single-threaded
lineage creation.
Implementation details:
* HMS messages are routed to Kafka partitions based on
{{{}dbName.tableName{}}}.
* HS2 messages are routed to *all* relevant partitions based on input/output
tables.
* Messages are processed in parallel by multiple consumer threads.
* Deduplication and shell entity handling is incorporated.
Attached architectural design document.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)