[ 
https://issues.apache.org/jira/browse/HBASE-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-29890:
-----------------------------------
    Labels: pull-request-available  (was: )

> WAL tailing reader should resume partial cell reads instead of resetting 
> compression
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-29890
>                 URL: https://issues.apache.org/jira/browse/HBASE-29890
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication, wal
>            Reporter: Sid Khillon
>            Assignee: Sid Khillon
>            Priority: Minor
>              Labels: pull-request-available
>
> When the WAL tailing reader hits EOF mid-cell during WAL compression, it 
> currently returns EOF_AND_RESET_COMPRESSION, which forces the reader to 
> re-read the entire WAL file from the beginning to rebuild dictionary state. 
> This is an O\(n\) operation that becomes increasingly expensive as the WAL 
> grows.
>  The root cause is that the CompressedKvDecoder eagerly adds entries to the 
> compression dictionaries (ROW, FAMILY, QUALIFIER, and tag dictionaries) as it 
> reads each field of a cell. If an IOException occurs partway through reading 
> a cell, the dictionaries are left in a partially-updated state that no longer 
> matches the actual stream position. The reader has no choice but to throw 
> away the entire compression context and start over.
> Proposed Fix is to defer dictionary additions until a cell is fully parsed:
>   - Buffer ROW/FAMILY/QUALIFIER dictionary additions in CompressedKvDecoder 
> and only commit them after parseCellInner() succeeds. On IOException, discard 
> the pending additions.
>   - Add a similar deferred-addition mode to TagCompressionContext for tag 
> dictionaries.
>   - Reset the ValueCompressor if an IOException occurs during the value 
> decompression phase.
> With deferred additions, hitting EOF mid-cell leaves the dictionaries in the 
> state they were after the last fully-read cell. This means the reader can 
> return EOF_AND_RESET (a cheap seek to the saved position) instead of 
> EOF_AND_RESET_COMPRESSION, and resume reading from where it left off once the 
> file grows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to