dene14 opened a new issue #8155: Historical node periodic timeouts for /druid/historical/v1/readiness URL: https://github.com/apache/incubator-druid/issues/8155 Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes"). ### Affected Version 0.14.2 ### Description Please include as much detailed information about the problem as possible. - Cluster size 7 historicals x 500Gb HDD x 16Gb Ram 3 brokers 6 middlemanagers x 2 workers (24Gb Ram per MM) Running in docker on K8S. Historical nodes stopping to respond to `readiness` endpoint (connection timeout), that happens approximately in a hour after start. As a result container getting killed. I suspect some sort of periodic routine happening in Historical, but nothing relevant logged unfortunately. I've found similar issue, but it's relatively old (Druid 0.8 epoch): https://groups.google.com/forum/#!topic/druid-user/vPcffwqf7d4 Recently I've had to change ingestion spec for one of my datasources in order to add datasketches. `maxRowsPerSegment` was reduced from 500K to 50K, otherwise it was allocating too many littleBufs (>120K) and dying with "Not enough direct memory". As a result of that change count of segments (or partitions of segments) jumped up dramatically. I suspect that this is connected somehow.  Any advice is much appreciated. If any additional information required, please LMK.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
