[ 
https://issues.apache.org/jira/browse/FLUME-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247990#comment-13247990
 ] 

[email protected] commented on FLUME-1104:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4662/
-----------------------------------------------------------

(Updated 2012-04-06 03:06:43.587885)


Review request for Flume.


Changes
-------

1) Conditionally add the bucket writer to local list.
2) The roller file extension to start with system timestamp instead of 0.
3) Updated tests that were counting number of files generated. The patch avoids 
the unnecessary close which reduces the file count. 


Summary
-------

The sink process() keep tracks of the buckets opened during the transaction. At 
the end of transaction, we need to flush all the buckets that has pending data. 
This is required in order to ensure that the data removed from channel should 
be safely in HDFS during commit.
Currently the files are tracked only when they are created and also getting 
closed during the cleanup instead of flush.

The fix is to track buckets every time they are written to in the current 
transaction. Also buckets with pending data should be flushed instead of close.


This addresses bug FLUME-1104.
    https://issues.apache.org/jira/browse/FLUME-1104


Diffs (updated)
-----

  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/BucketWriter.java
 7a94f97 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
 114682a 
  
flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/TestHDFSEventSink.java
 6ff3737 

Diff: https://reviews.apache.org/r/4662/diff


Testing
-------


Thanks,

Prasad


                
> HDFS rolls the first file incorrectly
> -------------------------------------
>
>                 Key: FLUME-1104
>                 URL: https://issues.apache.org/jira/browse/FLUME-1104
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>            Reporter: Prasad Mujumdar
>            Assignee: Prasad Mujumdar
>         Attachments: FLUME-1104-2.patch
>
>
> The sink process() keep tracks of the buckets opened during the transaction. 
> At the end of transaction, we need to flush all the buckets that has pending 
> data. This is required in order to ensure that the data removed from channel 
> should be safely in HDFS during commit.
> Currently the files are tracked only when they are created and also getting 
> closed during the cleanup instead of flush.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to