[
https://issues.apache.org/jira/browse/HBASE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050958#comment-15050958
]
Heng Chen commented on HBASE-14949:
-----------------------------------
This issue seems not as simple as i have thought yet, different WALs in one RS
maybe done split by different RS (SplitLogWorker is startup by each RS).
Because we need to know other WAL information to skip duplicate entries, it
seems difficult if we do it when split WAL.
Maybe we should go back to filter duplicate entries when replay.
So we read the first entry in each file to know what's the minSeqId in each
file. And we use this information to filter duplicates.
I update the patch. Thoughts?
> Skip duplicate entries when replay WAL.
> ---------------------------------------
>
> Key: HBASE-14949
> URL: https://issues.apache.org/jira/browse/HBASE-14949
> Project: HBase
> Issue Type: Sub-task
> Reporter: Heng Chen
> Attachments: HBASE-14949.patch
>
>
> As HBASE-14004 design, there will be duplicate entries in different WAL. It
> happens when one hflush failed, we will close old WAL with 'acked hflushed'
> length, then open a new WAL and write the unacked hlushed entries into it.
> So there maybe some overlap between old WAL and new WAL.
> We should skip the duplicate entries when replay. I think it has no harm to
> current logic, maybe we do it first.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)