[
https://issues.apache.org/jira/browse/HBASE-26658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474223#comment-17474223
]
Duo Zhang commented on HBASE-26658:
-----------------------------------
{quote}
I do not know why there is data loss ? Would you mind explain more? In
appendAndAsync, we indeed clear toWriteAppends after we successfully send them,
but we also put them in unackedAppends at the same time, so if we face a HDFS
again, unackedAppends would be transferred to the toWriteAppends to send them
again, it seems no data loss.
{quote}
OK, I missed that part too. So no data loss, but there could be other problems.
We need to track all the entries which have already been sent out without ack,
as we do not know the state of these entries. If we just clear them after
transferring them back to toWriteAppends, and then there is a shutdown, we may
report to the upper layer that these entries have not been written out, but
this is not true, it may have already been successfully persist to HDFS, but
only failed to report back to us due to some network issues between region
server and data node.
> AsyncFSWAL.unackedAppends should clear after transfered to
> AsyncFSWAL.toWriteAppends
> --------------------------------------------------------------------------------------
>
> Key: HBASE-26658
> URL: https://issues.apache.org/jira/browse/HBASE-26658
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 3.0.0-alpha-2, 2.4.9
> Reporter: chenglei
> Assignee: chenglei
> Priority: Major
>
> When {{ASyncFSWAL}} syncing to HDFS failed, {{AsyncFSWAL.unackedAppends}}
> are transfered to {{AsyncFSWAL.toWriteAppends}} to avoid data loss, but
> {{AsyncFSWAL.unackedAppends}} itself is not cleared. I think there is no need
> to continue retain them in {{AsyncFSWAL.unackedAppends}} because we would
> open a new HDFS pipeline to resend the {{AsyncFSWAL.unackedAppends}}.
> BTW : It would also simplify the logic for fixing HBASE-25905, current fix
> for HBASE-25905 is somewhat hard to understand. I think the problem to cause
> HBASE-25905 is that {{AsyncFSWAL.unackedAppends}} could not exactly reflect
> the *unacked* for current HDFS pipeline. If we clear
> {{AsyncFSWAL.unackedAppends}} after transferring them to
> {{AsyncFSWAL.toWriteAppends}}, HBASE-25905 could also avoid.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)