[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614991#comment-13614991
 ] 

Roshan Naik commented on FLUME-1227:
------------------------------------

I am not particularly wedded to the current approach. My first attempt based on 
your suggestion to inline the config of overflow channel in the SC itself. I 
discovered some [serious 
issues|https://issues.apache.org/jira/browse/FLUME-1227?focusedCommentId=13540116&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13540116]
 with it and so I pursued the alternative that had been discussed (but w/o 
consensus). Intent was to get the less contentious core logic working and 
return quickly to this phase of getting feedback on these shaky parts.

- Since you mention it, explicitly depending on FC  ( i assume by invoking 'new 
FileChannel()' inside SC ) ... has not been discussed. It might be worth 
considering. 

- Forking FC / Creating yet another durable channel : This has talked about and 
concerns have been with duplication of code (perhaps the most complex piece 
Flume code). I think Juhani also noted the same. I too am concerned about that. 
If forked.. each FC bug would have to fixed in 2 places. FC seems to keep 
evolving, and the for will likely become stale. I wonder, if it makes sense to 
derive a class from FC and use it as overflow instead.

- Your unresolved code review Question: We spoke about this when we met at the 
Flume meetup. On restart the overflow is drained completely first. It is 
addressed in the design doc under 'recovery from failures' but perhaps not very 
clearly.

- Yes, if SC does not have to guarantee strict ordering, then as long as counts 
in DOQ are correct, things will work fine. Ordering guarantees from overflow 
are needed only if SC is reqd to provide ordering guarantee. We already have a 
consensus that SC will not rely on any non-explicit FC guarantees.

- I totally agree with Hari and yourself on transactionCapacity issue. It makes 
total sense to expose channel size and capacity at the channel interface. I 
didn't do it in the first patch as I was afraid it might become a big point of 
contention. Perhaps a misplaced fear. MemC,FC & JdbcC may need minor tweaks for 
it. If there are no objections i can go ahead and make this change.


I think now the only remaining open issue is how to deal with Overflow. Let me 
list the options that have been put forward so far and some more : 

1) User specifies in config which channel to use as overflow : Current approach 
and has given me all the grief that i anticipated :)
2) Fork FC / create yet another durable FC like store. Then embed it into SC. 
Some comments have been made on this already.
3) Explicitly instantiate FC directly inside SC. 
4) Derive another class from FC and embed it into SC.
5) Based on Mike comment about SinkProcessors... Does it make sense to 
experiment with the notion of ChannelProcessors ? 
6) Any other ideas ? Now would be THE time to speak.


                
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
>                 Key: FLUME-1227
>                 URL: https://issues.apache.org/jira/browse/FLUME-1227
>             Project: Flume
>          Issue Type: New Feature
>          Components: Channel
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Roshan Naik
>         Attachments: 1227.patch.1, SpillableMemory Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to