[ https://issues.apache.org/jira/browse/KAFKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204267#comment-14204267 ]
Neha Narkhede commented on KAFKA-1754: -------------------------------------- bq. Streaming applications such as Samza, SparkStreaming and DataTorrents will benefit from running their workers on the same nodes as the partitions they are consuming data from. This is now possible in YARN. [~gwenshap] We tried to deploy Samza and try to co-locate Kafka partitions on the same box. The main problem was of I/O and memory resource isolation. Page cache issues between stateful jobs (that need to write to the local k/v store) and the Kafka brokers. Plus, Kafka's partitioning style doesn't lend itself to locality (writes go to arbitrary partitions (boxes) based on key, and reads are spread across partitions on many boxes). This issue of resource isolation is not just a problem with something like Samza but will be an issue with running Kafka with any other I/O heavy application on YARN. > KOYA - Kafka on YARN > -------------------- > > Key: KAFKA-1754 > URL: https://issues.apache.org/jira/browse/KAFKA-1754 > Project: Kafka > Issue Type: New Feature > Reporter: Thomas Weise > Attachments: DT-KOYA-Proposal- JIRA.pdf > > > YARN (Hadoop 2.x) has enabled clusters to be used for a variety of workloads, > emerging as distributed operating system for big data applications. > Initiatives are on the way to bring long running services under the YARN > umbrella, leveraging it for centralized resource management and operations > ([YARN-896] and examples such as HBase, Accumulo or Memcached through > Slider). This JIRA is to propose KOYA (Kafka On Yarn), a YARN application > master to launch and manage Kafka clusters running on YARN. Brokers will use > resources allocated through YARN with support for recovery, monitoring etc. > Please see attached for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)