mcvsubbu commented on issue #4626: Low level realtime consumer (LLC) got into 
ERROR state due to thread race condition.
URL: 
https://github.com/apache/incubator-pinot/issues/4626#issuecomment-534301317
 
 
   > Thanks @mcvsubbu for the replies.
   > 
   > > Do you have a feel of why we were waiting so long for the semaphore? The 
number of parallel builds allowed can be increased. I wonder what value you 
have it set to.
   > 
   > Our max.parallel.segment.builds is set to 2. Since the affected table runs 
on a busy shared tenant, the value seems to be too low?
   Unfortunately there is no one right answe. It is a balancing act between 
disk i/o, cpu and memory  on the host in which this is all happening. In 
general, having too many segments will make queries slow (and helix somewhat 
slow, but query effects are seen a lot sooner). Having too large segments will 
make segment build slow and have tons of oldgen garbage to contribute. Using 
offheap allocation alleviates some of this, but inverted indices are still on 
heap. Offheap allocation can have issues while building segments if you have 
too many consuming segments pounding on HDD. Remember, pages of consuming 
segments are always dirty and therefore the OS will flush them to disk. Your 
page-in-out could be high.
   
   You can try increasing the number of parallel segment builds. GC cycle after 
the builds can reclaim a large bunch of oldgen memory.
   
   We have found that using segment size to tune things (rather than number of 
rows or time) is the most optimal. Check out the blog 
https://engineering.linkedin.com/blog/2019/auto-tuning-pinot. This is coming to 
0.2.0 release soon, and is available on the master now.
   
   > 
   > > How many consuming segments do you have in the same host? Are you 
building large segments? Is it time to spread these partitions around?
   > 
   > The number of consuming segments depends on the number of realtime table 
partitions assigned to the host, right? This could be in the range of a few 
dozens. We also think big segment size makes the problem worse and the first 
temporary fix I did was to lower the segment flush threshold from 5 hrs to 0.5 
hrs. That mitigated the issue.
   > 
   
   But you may run into the effect of too many small segments. If it works for 
you, great. You can also try moving completed segments to a different set of 
hosts. Another feature that is in the master, and will be there in 0.2.0
   
   > > I am wondering if there will be performance issues anyway due to too 
many segments being present in the same host.
   > > So far from the ERROR state issue, the shared tenant seems to be fine in 
terms of ingestion and query performance. Our alert system did not fire 
additional alerts.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to