[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

Juhani Connolly (JIRA) Thu, 21 Mar 2013 19:33:17 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609824#comment-13609824
 ]


Juhani Connolly commented on FLUME-1227:
----------------------------------------

I had a look at the design doc and comments so just thought I'd chip in.

So long as we're only depending on the Channel interface for behaviors, I think 
we're good, I believe this was the intention in an earlier proposal of this 
feature.

I agree with Hari about ordering. It's not a guarantee we enforce in flume, and 
while nice, I think that it over-complicates things. 

As to lifecycle management, I don't necessary feel that having a channel own 
it's sub-channels is a particularly good precedent. I think it would be 
preferable that we allow the lifecycle manager to return interfaces rather than 
having components creating other components explicitly. Configuration would 
have to have some  grasp of dependencies though... Sub-channels would need to 
be instantiated before the "owner"

As to the fsync thing: definitely should be an option. Separate issue though. 
Making it possible to disable it would be great. Since this depends on in 
memory data, durability really shouldn't be an issue. If you have data in 
memory, it doesn't really matter if it's in the memory channel or in the OS 
file buffer

One thing you may want to consider is the approach taken by scribed(which has 
other problems,  but the buffer store implementation is very nice):
- Default to using the main channel
- Upon a next hop failure(roll back of take transaction in our case), switch to 
a buffering mode. All data is sent to the buffer channel until recovery. One 
may want to move the contents of the primary channel to the buffer if 
maintaining ordering is an objective. This could also reduce loss of data.
- During buffering mode, puts and takes go to the buffer channel, until it has 
been drained. Once it has been drained, return to "streaming" mode where 
operations are performed against the primary channel.
                
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
>                 Key: FLUME-1227
>                 URL: https://issues.apache.org/jira/browse/FLUME-1227
>             Project: Flume
>          Issue Type: New Feature
>          Components: Channel
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Roshan Naik
>         Attachments: 1227.patch.1, SpillableMemory Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

Reply via email to