PramodSSImmaneni commented on PR #13982:
URL: https://github.com/apache/druid/pull/13982#issuecomment-1533849615

   @kfaraz The slowness in creation and saving of the incremental index files 
(because of the large number of columns) was causing ingestion lag to increase 
continuously and it was falling behind more and more. It would take about 3 to 
10 seconds to create the incremental index files after data had been ingested 
and incremental segment was ready to be persisted (because configured 
thresholds were reached). This would case the ingestion to fall behind even 
though there were available cpus on the node. There are multiple segment 
intervals being ingested at same time so there are multiple index files and 
these were being saved in a serial fashion. From what I could see there were no 
dependencies between them and they could be persisted parallelly. 
   
   We have a mixture of datasources and some are small that don't need this 
higher degree of parallelism, initially I considered a MM property but then it 
would apply to all datasources and a larger datasource may be starved while a 
smaller one is using  the extra threads. Having it on a per datasource basis 
allows it to be configurable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to