kfaraz commented on PR #13982:
URL: https://github.com/apache/druid/pull/13982#issuecomment-1532446523

   @PramodSSImmaneni , 3 to 10s seems like a reasonable time for a datasource 
with 4000 columns.
   Did you see this time increase abruptly in any specific scenario? I agree 
that there is room for parallelization here, but we need to understand the 
exact benefits that we would get from it. 
   
   It would be nice if you could share some more details on the following 
points:
   - Did real-time ingestion slow down at any point, i.e. was there a lag 
buildup?
   - How long did it take to persist the segments
   - Once the segments were persisted, did lag catch up?
   - With parallelization, how did the lag behave?
   - What was your cluster setup? number of task slots, worker node types
   
   If we do realize that parallelization is in fact needed here, it would be an 
MM/Indexer runtime property rather than a tuning config that has to be passed 
through all index specs. Users doing an ingestion need not be exposed to this 
detail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to