hi Peter, There is no any operation log? Please using the state change log with debug level. I checked the code for creating 'symlink'. It looks like regular operation with other interfaces at NameNode. GC, directory lock, shell script may be doubtful point.
-Regards Denny Ye 2012/2/24 Tianqiang Peter Li <peter...@gmail.com> > Hi, Denny, > Thanks for the info, you are right about his, after some digging, I found > there were no data loss, the root cause of getting empty file is that > during the file rotation, the old file was closed, the new empty new file > was created/opened, then hdfs tried to create the small file(acting as > symbol link in hdfs) for this file, but for some reasons(gc?), the > operation for creating this symbol link file stuck for 2 minutes, then > right after it was successfully created, the scribe did a periodic check, > found this file has passed the rotation window(the rotation period is > 1min), then rotate out this newly-created empty file. It's good to know > there is no data integrity issue. I am still looking at where this big > hiccup comes from. Anyway, thanks for the helpful comments on this. > > -Peter > > > On Sun, Feb 19, 2012 at 6:25 PM, Denny Ye <denny...@gmail.com> wrote: > >> hi Peter, >> I met this problem in our environment at one time. It may caused >> by the blank data stream from Scribe to HDFS. In fixed rotate model, there >> is no any data. Otherwise, NameNode can start the 'allocate block' >> operation during the first data packet to HDFS client. Also, this only one >> tip to your doubt. >> >> -Regards >> Denny Ye >> >> >> 2012/2/19 Tianqiang Peter Li <peter...@gmail.com> >> >>> Hi, guys, >>> I am using scribe to write data directly to hdfs, it works well most of >>> the time, but sporadically, I found some of files(1-2 out of 6000 per day) >>> written to hdfs become empty files, interestingly, this only happens when >>> hdfs is busy handing lots of other write request at the same time, here is >>> log pulled from namenode's log output, it seems the allocateBlock operation >>> following the create operation is vanished, unlike other normal operations. >>> Here are the comparison of a normal file and a abnormal one below: >>> >>> ----------- *Normal* one ( >>> file:/prefix-2012-02-18_00410 )--------------------------------- >>> 2012-02-18 02:32:15,873 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ >>> 10.16.35.236 cmd=create src=/prefix-2012-02-18_00410 >>> dst=null perm=user:supergroup:rw-r--r-- >>> 2012-02-18 02:32:15,905 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* >>> NameSystem.allocateBlock*: /prefix-2012-02-18_00410. >>> blk_-2839230806711337016_129091175 >>> 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: >>> Removing lease on file /prefix-2012-02-18_00410 from client >>> DFSClient_-36442571 >>> 2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: DIR* >>> NameSystem.completeFile: file /prefix-2012-02-18_00410 is closed by >>> DFSClient_-36442571 >>> ----------- End -------------- >>> ----------- *Problematic* one ( >>> file:/prefix-2012-02-18_00409 )---------------------------- >>> 2012-02-18 02:30:47,287 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/ >>> 10.16.35.236 cmd=create src=/prefix-2012-02-18_00409 >>> dst=null perm=user:supergroup:rw-r--r-- >>> *<comment: no allocateBlock is found, and there is 60-90 sec gap >>> between>* >>> 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: >>> Removing lease on file /prefix-2012-02-18_00409 from client >>> DFSClient_748605466 >>> 2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: DIR* >>> NameSystem.completeFile: file /prefix-2012-02-18_00409 is closed by >>> DFSClient_748605466 >>> ----------- End ------------------- >>> >>> Not sure this is more scribe-related or hdfs-related, let me try >>> shooting the question in this mailing-list. >>> >>> The 90 secs delay looks suspicious to me, and might cause this issue as >>> I guess, did you guys meet this situation before? Any light shed on me >>> about why is that and how to work it around to guarantee the entire file >>> has been successfully saved into Hdfs? Any places I should focus on for >>> improvements?(on hdfs side(e.g. hdfs tunning, gc tuning etc) or application >>> side(e.g. timeout setting on hdfs api calling in scribe), your >>> suggestions/thoughts are greatly appreciated. >>> >>> Regards, >>> Peter >>> >>> >> >