599166320 commented on issue #15177:
URL: https://github.com/apache/druid/issues/15177#issuecomment-1771999731

   I reviewed historical issues, and it's possible that #12191, #8959, and this 
issue are of the same category.
   
   The problem becomes particularly severe when ingesting schemaless data, 
resulting in task failures and data loss. I tried increasing parallelism and 
reducing the size of individual segments. However, in practice, it's 
challenging to control. Especially when ingesting schemaless data, each column 
has different cardinality and size. In one scenario, even if the data written 
for each column is not substantial under normal circumstances, during 
exceptional cases where a large amount of stack information needs to be stored, 
it can lead to an exceptionally large stack column data. If we set the number 
of rows for each segment very low just to prevent buffer overflows, it results 
in a high number of segment files and significantly increases metadata, which 
in turn leads to a sharp decline in server performance and the scheduling 
performance of the storage cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to