jihoonson commented on issue #7838: Improve IncrementalIndex concurrency 
scalability
URL: https://github.com/apache/incubator-druid/pull/7838#issuecomment-502899750
 
 
   @eranmeir and @sanastas, thank you for the explanation.
   
   For faster data ingestion, our previous work has been focused on distributed 
data ingestion. This is already done for stream ingestion in Kafka/Kinesis 
indexing service. For batch ingestion, we already support ingestion with Hadoop 
and native distributed batch indexing is also in development 
(https://github.com/apache/incubator-druid/issues/5543). 
   
   For stream ingestion, I think it's more important to serve as many queries 
as possible at the same time than faster indexing. That means, most threads 
should be assigned for query processing rather than indexing. However, in batch 
ingestion, there's no need to serve queries while indexing, so it makes more 
sense to apply multi-threaded indexing. Less segment merges sounds great, but 
it's still unclear to me exactly how this PR and Oak could improve data 
ingestion performance of Druid. Actually, I thought Oak is for improving query 
performance of incremental index.
   
   My feeling is that my confusion comes from the lack of the 
[proposal](https://github.com/apache/incubator-druid/issues/new?assignees=&labels=Proposal%2C+Design+Review&template=proposal.md&title=).
 I know you opened https://github.com/apache/incubator-druid/issues/5698 and 
https://github.com/apache/incubator-druid/pull/7676, but it still seems many 
parts are unclear. For example, what's the motivation of using Oak exactly? Is 
it better memory management and better concurrent writes? I think it would be 
nicer if things are explained and described in one proposal. I would happily 
help you with the proposal if you need.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to