mcvsubbu commented on issue #4626: Low level realtime consumer (LLC) got into ERROR state due to thread race condition. URL: https://github.com/apache/incubator-pinot/issues/4626#issuecomment-534301317 > Thanks @mcvsubbu for the replies. > > > Do you have a feel of why we were waiting so long for the semaphore? The number of parallel builds allowed can be increased. I wonder what value you have it set to. > > Our max.parallel.segment.builds is set to 2. Since the affected table runs on a busy shared tenant, the value seems to be too low? Unfortunately there is no one right answe. It is a balancing act between disk i/o, cpu and memory on the host in which this is all happening. In general, having too many segments will make queries slow (and helix somewhat slow, but query effects are seen a lot sooner). Having too large segments will make segment build slow and have tons of oldgen garbage to contribute. Using offheap allocation alleviates some of this, but inverted indices are still on heap. Offheap allocation can have issues while building segments if you have too many consuming segments pounding on HDD. Remember, pages of consuming segments are always dirty and therefore the OS will flush them to disk. Your page-in-out could be high. You can try increasing the number of parallel segment builds. GC cycle after the builds can reclaim a large bunch of oldgen memory. We have found that using segment size to tune things (rather than number of rows or time) is the most optimal. Check out the blog https://engineering.linkedin.com/blog/2019/auto-tuning-pinot. This is coming to 0.2.0 release soon, and is available on the master now. > > > How many consuming segments do you have in the same host? Are you building large segments? Is it time to spread these partitions around? > > The number of consuming segments depends on the number of realtime table partitions assigned to the host, right? This could be in the range of a few dozens. We also think big segment size makes the problem worse and the first temporary fix I did was to lower the segment flush threshold from 5 hrs to 0.5 hrs. That mitigated the issue. > But you may run into the effect of too many small segments. If it works for you, great. You can also try moving completed segments to a different set of hosts. Another feature that is in the master, and will be there in 0.2.0 > > I am wondering if there will be performance issues anyway due to too many segments being present in the same host. > > So far from the ERROR state issue, the shared tenant seems to be fine in terms of ingestion and query performance. Our alert system did not fire additional alerts.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
