[ 
https://issues.apache.org/jira/browse/FLUME-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234087#comment-13234087
 ] 

[email protected] commented on FLUME-896:
-----------------------------------------------------



bq.  On 2012-03-21 02:29:18, Arvind Prabhakar wrote:
bq.  > Thanks for the patch Brock. The changes look good. Some feedback follows:
bq.  > 
bq.  > General:
bq.  > * There are a few places where whitespace has creeped in - highlighted 
as red in the review. Please remove it in the final patch.
bq.  > * Also, there are a few places where the line length exceeds 80 chars, 
will be nice to take care of that as well in the final patch.
bq.  > 
bq.  > Design/Impl:
bq.  > * The FileBackedTransaction.doCommit() only calls the 
fileChannel.commitSequenceID(lastSequenceId) if the lastSequenceId is greater 
than 0. This will not be the case if the channel has no sinks for it. As a 
result everything will be replayed every time the channel starts up, causing 
FileChannel.configure() to fail with OOM after a certain threshold is breached. 
Ideally, the sequeceId commit should happen on every commit call.
bq.  > * On the same lines, the implementation should override the stop() 
method to perform cleanup and close any of the WAL resources that may be 
actively in use.
bq.  > * The take implementation seems to be limited to the contents of 
MemoryChannel only. This will not have any of the events that were persisted in 
the previous run of the FileChannel and therefore those events will get locked 
out of any sink's reach.
bq.  > * Finally, it is not clear to me from the implementation how the 
concurrent semantic will work. Can you please explain (or better still javadoc) 
the expected workings of the WAL implementation?
bq.  >

bq.  The FileBackedTransaction.doCommit() only calls the 
fileChannel.commitSequenceID(lastSequenceId) if the lastSequenceId is greater 
than 0. This will not be the case if the channel has no sinks for it. As a 
result everything will be replayed every time the channel starts up, causing 
FileChannel.configure() to fail with OOM after a certain threshold is breached. 
Ideally, the sequeceId commit should happen on every commit call.

We can only commit sequence ids which have been read off the queue. All 
SequenceIDs will be greater than zero so any event read off the queue and 
committed will have its sequence id written to the WAL. Note we can still OOM 
when the number of sequence ids *not events* which have been written to disk 
exceed 8.3 million (64MB (default direct memory size) / 8 bytes). We can 
predicate when this may occur and print a message saying how much 
-XX:MaxDirectMemorySize needs to be increased when it does occur.

At this point, I think the only alternative is to implement complex merge sort 
logic for the sequence id portion of the WAL.

bq.  On the same lines, the implementation should override the stop() method to 
perform cleanup and close any of the WAL resources that may be actively in use.

done

bq.  The take implementation seems to be limited to the contents of 
MemoryChannel only. This will not have any of the events that were persisted in 
the previous run of the FileChannel and therefore those events will get locked 
out of any sink's reach.

When we replay the log we re-populate MemoryChannel so events there were 
committed to MemoryChannel will be persisted to disk.

bq.  Finally, it is not clear to me from the implementation how the concurrent 
semantic will work. Can you please explain (or better still javadoc) the 
expected workings of the WAL implementation?

I updated the patch based on some testing on a cluster I did. I found that 
sequence id writes were waiting on event writes often. As such, now the only 
synchronization WAL does is when a roll is occurring. Otherwise the 
synchronization has been pushed down to the object actually writing to the file 
WALDataFile.Writer.


bq.  On 2012-03-21 02:29:18, Arvind Prabhakar wrote:
bq.  > 
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java,
 line 90
bq.  > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line90>
bq.  >
bq.  >     It will be great if this memory channel can also be configured 
correctly via the system configuration. That way, the end user will be able to 
specify the cap on the number of events that can live in all the active 
transactions combined.
bq.  >     
bq.  >     This will help ensure that this channel does not run into OOM issues.

done


bq.  On 2012-03-21 02:29:18, Arvind Prabhakar wrote:
bq.  > 
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java,
 line 108
bq.  > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line108>
bq.  >
bq.  >     This constructor does not allow the specification of roll size, roll 
required interval, max log size, min log retention period and worker interval.  
All of these properties should be extracted from the context and passed into 
the WAL constructor.

done


bq.  On 2012-03-21 02:29:18, Arvind Prabhakar wrote:
bq.  > 
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java,
 lines 109-111
bq.  > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line109>
bq.  >
bq.  >     This logic is probably better done in the start() method than in the 
configure method.

done


bq.  On 2012-03-21 02:29:18, Arvind Prabhakar wrote:
bq.  > 
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java,
 line 1
bq.  > <https://reviews.apache.org/r/4325/diff/4/?file=92221#file92221line1>
bq.  >
bq.  >     Missing license header.

done


bq.  On 2012-03-21 02:29:18, Arvind Prabhakar wrote:
bq.  > 
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java,
 line 1
bq.  > <https://reviews.apache.org/r/4325/diff/4/?file=92227#file92227line1>
bq.  >
bq.  >     Missing license header.

done


bq.  On 2012-03-21 02:29:18, Arvind Prabhakar wrote:
bq.  > 
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java,
 line 84
bq.  > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line84>
bq.  >
bq.  >     The JDBC channel defaults to ${user.home}/.flume/jdbc-channel. I 
suggest defaulting the file channel working directory to something similar like 
${user.home}/.flume/file-channel

done


bq.  On 2012-03-21 02:29:18, Arvind Prabhakar wrote:
bq.  > 
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java,
 lines 61-65
bq.  > <https://reviews.apache.org/r/4325/diff/4/?file=92219#file92219line61>
bq.  >
bq.  >     Please override the initialize() method to do any initialization for 
the system.

This is done in the configure method since this should be overridden.


bq.  On 2012-03-21 02:29:18, Arvind Prabhakar wrote:
bq.  > flume-ng-channels/flume-file-channel/pom.xml, lines 76-78
bq.  > <https://reviews.apache.org/r/4325/diff/4/?file=92218#file92218line76>
bq.  >
bq.  >     This change seems to be unrelated to the file channel. If that is 
the case, we should factor it out.

We need hadoop as a dep so we can use writables.


- Brock


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4325/#review6126
-----------------------------------------------------------


On 2012-03-21 04:45:55, Brock Noland wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4325/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-21 04:45:55)
bq.  
bq.  
bq.  Review request for Flume.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Attached patch not meant for commit. Just posting here for easy review.
bq.  
bq.  
bq.  This addresses bug FLUME-896.
bq.      https://issues.apache.org/jira/browse/FLUME-896
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-ng-channels/flume-file-channel/pom.xml 48d1481 
bq.    
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
 a279453 
bq.    
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannelEvent.java
 PRE-CREATION 
bq.    
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/SequenceIDBuffer.java
 PRE-CREATION 
bq.    
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WAL.java
 PRE-CREATION 
bq.    
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALDataFile.java
 PRE-CREATION 
bq.    
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALEntry.java
 PRE-CREATION 
bq.    
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/wal/WALReplayResult.java
 PRE-CREATION 
bq.    
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java
 ab66998 
bq.    
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestSequenceIDBuffer.java
 PRE-CREATION 
bq.    
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/wal/TestWAL.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4325/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Brock
bq.  
bq.


                
> Implement file write ahead log channel
> --------------------------------------
>
>                 Key: FLUME-896
>                 URL: https://issues.apache.org/jira/browse/FLUME-896
>             Project: Flume
>          Issue Type: New Feature
>          Components: Channel
>    Affects Versions: NG alpha 1
>            Reporter: E. Sammer
>            Assignee: Brock Noland
>             Fix For: v1.2.0
>
>         Attachments: FLUME-896-1.patch, FLUME-896-4.patch, FLUME-896-5.patch
>
>
> Implement a channel that uses a regular file system and a write ahead log for 
> durable event delivery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to