Hi Yanbo, > Write edits log to HBase, additions are appended to the end of the WAL file rather than reopen the HDFS file second.
I mean the HBase replication procedure. After edits are appended to HLog, I think there is a background thread periodically polls new edits from HLog and syncs them to slave clusters. 2013/12/27 Yanbo Liang <yanboha...@gmail.com> > Hi Chao, > As far as I know, if client B opens the file which is under construction, > the DFSInputStream will get the LocatedBlocks object and it contains a > member variable which called "underConstruction" to mark this file is under > construction. > If the file is reopen, the client will get a different length. I think > this is make sense because that the file is no longer the old one but one > with new append data. > > Write edits log to HBase, additions are appended to the end of the WAL > file rather than reopen the HDFS file second. > > > 2013/12/27 Chao Shi <stepi...@live.com> > >> Hi users, >> >> Suppose a client A opens /f and keep appending data then hflushing. >> Another client B opens this file for read. I found that B can only see the >> snapshot of data at the time he opens the file. (After B's opening, A may >> continue to write more data. B cannot see it unless reopen.) >> >> Looking into the code, I think this is because DFSInputStream maintains a >> file length and simply report EOF when we read beyond the file length. The >> file length is updated and thus the client has a chance to see longer file >> when: >> 1) the file is open >> 2) no live DNs to read from (correct? not very sure.) >> >> I think such behaviour is inconsistent. Clients may see a sudden change >> of file length. I guess a better behaviour is to always try to read beyond >> the known file length at client-side and let the DN to return EOF if no >> more data. In this way, the client B can continue to see what A wrote and >> hflushed. >> >> A real use case for this is HBase log replication. In the region server, >> there is a background thread keep polling for new HLog entries. It has to >> reopen every second. This may put a pressure on NN if the number of region >> servers gets larger. >> >> Please correct me if there is anything wrong. >> >> Thanks, >> Chao >> > >