jihoonson commented on issue #7838: Improve IncrementalIndex concurrency scalability URL: https://github.com/apache/incubator-druid/pull/7838#issuecomment-502899750 @eranmeir and @sanastas, thank you for the explanation. For faster data ingestion, our previous work has been focused on distributed data ingestion. This is already done for stream ingestion in Kafka/Kinesis indexing service. For batch ingestion, we already support ingestion with Hadoop and native distributed batch indexing is also in development (https://github.com/apache/incubator-druid/issues/5543). For stream ingestion, I think it's more important to serve as many queries as possible at the same time than faster indexing. That means, most threads should be assigned for query processing rather than indexing. However, in batch ingestion, there's no need to serve queries while indexing, so it makes more sense to apply multi-threaded indexing. Less segment merges sounds great, but it's still unclear to me exactly how this PR and Oak could improve data ingestion performance of Druid. Actually, I thought Oak is for improving query performance of incremental index. My feeling is that my confusion comes from the lack of the [proposal](https://github.com/apache/incubator-druid/issues/new?assignees=&labels=Proposal%2C+Design+Review&template=proposal.md&title=). I know you opened https://github.com/apache/incubator-druid/issues/5698 and https://github.com/apache/incubator-druid/pull/7676, but it still seems many parts are unclear. For example, what's the motivation of using Oak exactly? Is it better memory management and better concurrent writes? I think it would be nicer if things are explained and described in one proposal. I would happily help you with the proposal if you need.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
