sidkhillon opened a new pull request, #7741:
URL: https://github.com/apache/hbase/pull/7741

   When the WAL tailing reader hits EOF mid-cell during WAL compression, it 
currently returns EOF_AND_RESET_COMPRESSION, which forces the reader to re-read 
the entire WAL file from the beginning to rebuild dictionary state. This is an 
O(n) operation that becomes increasingly expensive as the WAL grows.
   
   The root cause is that the CompressedKvDecoder eagerly adds entries to the 
compression dictionaries (ROW, FAMILY, QUALIFIER, and tag dictionaries) as it 
reads each field of a cell. If an IOException occurs partway through reading a 
cell, the dictionaries are left in a partially-updated state that no longer 
matches the actual stream position. The reader has no choice but to throw away 
the entire compression context and start over.
   
   Proposed Fix is to defer dictionary additions until a cell is fully parsed:
   
   Buffer ROW/FAMILY/QUALIFIER dictionary additions in CompressedKvDecoder and 
only commit them after parseCellInner() succeeds. On IOException, discard the 
pending additions.
   Add a similar deferred-addition mode to TagCompressionContext for tag 
dictionaries.
   Reset the ValueCompressor if an IOException occurs during the value 
decompression phase.
   With deferred additions, hitting EOF mid-cell leaves the dictionaries in the 
state they were after the last fully-read cell. This means the reader can 
return EOF_AND_RESET (a cheap seek to the saved position) instead of 
EOF_AND_RESET_COMPRESSION, and resume reading from where it left off once the 
file grows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to