dlg99 opened a new issue #1086:  io.netty.util.internal.OutOfDirectMemoryError 
under continuous heavy load. 
URL: https://github.com/apache/bookkeeper/issues/1086
 
 
   Bug report
   
   Ran long stress test (~1hr) of the write load that was bottlenecked by NIC 
bandwidth (10G).
   Some bookies in the cluster could not handle this load (due to compaction, 
disk configuration that we tested etc.)
   Got a write timeout, client's logs had 
"io.netty.util.internal.OutOfDirectMemoryError".
   OODME appeared even in shorter runs that succeeded (ensemble changes/bookie 
reconnects succeeded before request timeout).
   Client had 10G heap and 10G of direct memory allocated.
   
   Expected the load to complete without write errors, possibly with temporary 
throughput decrease.
   
   After the investigation it appears that PCBC does not respect netty's 
isWritable() state and keeps on writing to a channel that cannot write fast 
enough.
   Specifically this is possible because we had Ack Quorum == 2 and Write 
Quorum == 3.
   Write is considered successful when AQ replied, so 1 "slow channel" keeps on 
getting data written to it this can use up all available direct memory. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to