[ 
https://issues.apache.org/jira/browse/CHUKWA-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774210#action_12774210
 ] 

Jiaqi Tan commented on CHUKWA-410:
----------------------------------

 > What do you mean by: "the raw log files are complete"?
 >  --> the datasink file from the collector is complete? 

Actually, let me check on that. I was just wondering if the semantics of 
WAIT_TILL_FINISHED could result in any races, i.e. blocks closed without the 
file being fully written, and the Demux hitting an incomplete file and 
processing only the blocks that had been closed so far. 

> Does the BackfillingLoader return only after HDFS blocks are committed?
> -----------------------------------------------------------------------
>
>                 Key: CHUKWA-410
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-410
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>    Affects Versions: 0.3.0
>         Environment: Hadoop 0.20.0, Debian 4 (Etch), Chukwa rev 817532
>            Reporter: Jiaqi Tan
>
> I see that the BackfillingLoader is set to 
> AdaptorShutdownPolicy.WAIT_TILL_FINISHED, what are the semantics of this? 
> Does this mean that the BackfillingLoader returns after the last HDFS write 
> request is made, but the DFSClient could continue to be flushing blocks to 
> the DataNodes in the background? Or does that mean that the entire file has 
> been written/flushed to HDFS and closed and fully available?
> I'm running the Demux immediately after the BackfillingLoader is complete; 
> the raw log files are complete, but the Demux picks up only half of the 
> entries in those log files. Could this be because some blocks are not closed 
> yet?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to