[
https://issues.apache.org/jira/browse/ATLAS-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Radhika Kundam updated ATLAS-5320:
----------------------------------
Attachment: Apache Atlas - Distributed Notification Processing.pdf
> Distributed Notification Processing
> -----------------------------------
>
> Key: ATLAS-5320
> URL: https://issues.apache.org/jira/browse/ATLAS-5320
> Project: Atlas
> Issue Type: New Feature
> Reporter: Radhika Kundam
> Assignee: Radhika Kundam
> Priority: Major
> Attachments: Apache Atlas - Distributed Notification Processing.pdf
>
>
> Current entity and lineage processing in Atlas is mostly single-threaded to
> maintain message order, which limits scalability. HMS messages create
> entities, and HS2 messages create lineage based on those entities. To ensure
> correct lineage, we currently serialize all processing, which becomes a
> performance bottleneck.
> As a solution for this, Introduce a proof-of-concept for scalable message
> processing in Apache Atlas by using multiple Kafka topics based on a key
> ({{{}dbName.tableName{}}}). This will enable parallel processing of HMS and
> HS2 messages, improve throughput, and reduce bottlenecks caused by
> single-threaded lineage creation.
> Implementation details:
> * HMS messages are routed to Kafka partitions based on
> {{{}dbName.tableName{}}}.
> * HS2 messages are routed to *all* relevant partitions based on input/output
> tables.
> * Messages are processed in parallel by multiple consumer threads.
> * Deduplication and shell entity handling is incorporated.
> Attached architectural design document.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)