Hi Yanbo,

> Write edits log to HBase, additions are appended to the end of the WAL
file rather than reopen the HDFS file second.

I mean the HBase replication procedure. After edits are appended to HLog, I
think there is a background thread periodically polls new edits from HLog
and syncs them to slave clusters.


2013/12/27 Yanbo Liang <yanboha...@gmail.com>

> Hi Chao,
> As far as I know, if client B opens the file which is under construction,
>  the DFSInputStream will get the LocatedBlocks object and it contains a
> member variable which called "underConstruction" to mark this file is under
> construction.
> If the file is reopen, the client will get a different length. I think
> this is make sense because that the file is no longer the old one but one
> with new append data.
>
> Write edits log to HBase, additions are appended to the end of the WAL
> file rather than reopen the HDFS file second.
>
>
> 2013/12/27 Chao Shi <stepi...@live.com>
>
>> Hi users,
>>
>> Suppose a client A opens /f and keep appending data then hflushing.
>> Another client B opens this file for read. I found that B can only see the
>> snapshot of data at the time he opens the file. (After B's opening, A may
>> continue to write more data. B cannot see it unless reopen.)
>>
>> Looking into the code, I think this is because DFSInputStream maintains a
>> file length and simply report EOF when we read beyond the file length. The
>> file length is updated and thus the client has a chance to see longer file
>> when:
>> 1) the file is open
>> 2) no live DNs to read from (correct? not very sure.)
>>
>> I think such behaviour is inconsistent. Clients may see a sudden change
>> of file length. I guess a better behaviour is to always try to read beyond
>> the known file length at client-side and let the DN  to return EOF if no
>> more data. In this way, the client B can continue to see what A wrote and
>> hflushed.
>>
>> A real use case for this is HBase log replication. In the region server,
>> there is a background thread keep polling for new HLog entries. It has to
>> reopen every second. This may put a pressure on NN if the number of region
>> servers gets larger.
>>
>> Please correct me if there is anything wrong.
>>
>> Thanks,
>> Chao
>>
>
>

Reply via email to