[
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037338#comment-15037338
]
Duo Zhang commented on HBASE-14790:
-----------------------------------
I read the code in {{NameNode}} and {{DFSOutputStream}} and I think I
understand why [~zhz] said bumping GS is necessary.
There are two scenarios:
1. The endBlock operation has finished with at least one datanode being
success. Under this scenario we could just call completeFile to close the file
since we know the exact file length.
2. The endBlock operation has failed on all datanodes. Under this scenario, the
"acked length" may not be the actual length of the block, maybe it is longer
and cause the assert at namenode fail.
{code}
assert block.getNumBytes() <= commitBlock.getNumBytes() :
"commitBlock length is less than the stored one "
+ commitBlock.getNumBytes() + " vs. " + block.getNumBytes();
{code}
And even if we pass the assert, it does not mean the block has the right length
since it may have not been reported to namenode yet, and it is not safe to
truncate the block since other one may have already read the data after the
truncating point(think of wal replication). So under this scenario, at least we
need to reach a consensus on the block length with each datanode before
completing the file. Maybe bumping GS is the only way to do this in HDFS?
> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
> Issue Type: Improvement
> Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all
> purposes. But in fact, we do not need most of the features if we only want to
> log WAL. For example, we do not need pipeline recovery since we could just
> close the old logger and open a new one. And also, we do not need to write
> multiple blocks since we could also open a new logger if the old file is too
> large.
> And the most important thing is that, it is hard to handle all the corner
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when
> using original DFSOutputStream due to its complicated logic. And the
> complicated logic also force us to use some magical tricks to increase
> performance. For example, we need to use multiple threads to call {{hflush}}
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when
> logging WAL. For correctness, and also for performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)