dlg99 opened a new issue #1086: io.netty.util.internal.OutOfDirectMemoryError under continuous heavy load. URL: https://github.com/apache/bookkeeper/issues/1086 Bug report Ran long stress test (~1hr) of the write load that was bottlenecked by NIC bandwidth (10G). Some bookies in the cluster could not handle this load (due to compaction, disk configuration that we tested etc.) Got a write timeout, client's logs had "io.netty.util.internal.OutOfDirectMemoryError". OODME appeared even in shorter runs that succeeded (ensemble changes/bookie reconnects succeeded before request timeout). Client had 10G heap and 10G of direct memory allocated. Expected the load to complete without write errors, possibly with temporary throughput decrease. After the investigation it appears that PCBC does not respect netty's isWritable() state and keeps on writing to a channel that cannot write fast enough. Specifically this is possible because we had Ack Quorum == 2 and Write Quorum == 3. Write is considered successful when AQ replied, so 1 "slow channel" keeps on getting data written to it this can use up all available direct memory.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
