[
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609360#comment-13609360
]
Hari Shreedharan commented on FLUME-1227:
-----------------------------------------
{quote}
1) WRT the concern on not depending on another channel, i went down this path
since it looked like there was some consensus when i started. What alternative
design do you have in mind ?
2) WRT change in memory/file channel breaking the Spillable channel: Could you
expand a bit ? I am not familiar with replay order issue and how it can impact.
I dont think there is any intrinsic assumption being made wrt to any specific
channel's behavior. Just to be doubly sure, i made sure not to rely on a single
type of overflow channel in all the tests. The only material dependency (as far
as I can tell) that Spillable Channel has on the overflow is the interface
level guarantee that is expected from all channels: that order is maintained in
case of single source/sink.
Do you see any other assumptions/dependencies hiding there ?
{quote}
I am sorry, I was not part of the initial discussions - so I was not aware of
the consensus aspect. What I am saying is that being dependent on another
channel creates an undesired strong coupling between this channel and the other
channels. An if there are unit tests in this channel which can break if one of
the other channels' behavior is changed, then it is not something that is
acceptable. If you look at all our other components, none of them have a
dependence on each other (except the RPCClients - that is because the sinks are
just glorified RPCClients).
The reason I would not agree with even the single source/sink replay order is
that our interfaces do not really enforce this. This is not really even
enforced anywhere in the documentation either. The FileChannel did not even
conform to that single source/sink replay order until FLUME-1432. In fact,
conforming to that order even in FLUME-1432 was a side-effect of fixing a race
condition, and not specifically because it was meant to be handled. At some
point, if it is decided this can change again to some other order (maybe a
thread based ordering, or or an order in which events in a transaction will all
get written out together on commit, rather than getting written out on put and
fsynced on commit), then if this channels' tests break, the onus will be on the
contributor who submitted the file channel change to fix it - which I do not
agree with.
In summary, I am ok with depending on other channels. What I am not ok with is
depending on the behavior of those channels, which are not explicitly
guaranteed through interfaces (or even documentation).
bq. 3) WRT reserving capacity on both channels. If you mean that each txn
should not reserve capacity on both channels. I agree. And the current
implementation does not do that. Or were you by any chance referring to the
issue of upfront reservation (at put() time) versus commit() time ?
I am talking about put v/s commit time. In most cases, transaction capacity is
often configured to be much higher than the the max expected in most cases. I
would suggest doing a full implementation where there is a transaction outside,
and a backing store inside. Once the transaction is about to get committed,
then decide where the events go. (It is going to be tricky to do this and avoid
doing all the writes at once - the File Channel fsyncs on commit, but writes to
OS buffers on every write - so it is possible some data is flushed to disk
before explicit fsyncs). This is not a blocker anyway, we can work on it later
as well.
bq. 4) WRT to testing with fsyncs removed, i have not pursued it since i felt
that would be compromising the durability guarantees. Do you think its useful
to do that ?
I was wondering whether simply adding a config param to change the fsyncs
(fsync all files before checkpoint in parallel or something) to optional will
give comparable performance to what is being proposed in this jira. I have a
feeling it might, since fsyncs are the most expensive part of the file channel,
and removing the fsyncs just writes to the in-memory OS buffer and the fsyncs
will be taken care of in the background.
{quote}
5) WRT "we should make the configuration change". Can you elaborate ? I am not
certain which change specifically you are referring to. Or are you referring to
the whole config approach ?
6) WRT lifecycle management and dependencies : After configuration, any channel
that is found to be not connected with a source/sink is automatically discarded
from the list of Life cycle system managed components. Consequently the
Spillable Channel becomes the sole life cycle manager of the overflow channel.
Otherwise, yes there would be havoc.
{quote}
I just think we should not allow one component to pull a reference to another
component in the system. This explicitly breaks the "interact via interfaces"
idea. We could make sure the spillable channel own both the channels (and
manages the lifecycle of these) - to avoid components which end up being able
to access other components owned by the lifecycle manager.
Hope I made myself clearer this time!
> Introduce some sort of SpillableChannel
> ---------------------------------------
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
> Issue Type: New Feature
> Components: Channel
> Reporter: Jarek Jarcec Cecho
> Assignee: Roshan Naik
> Attachments: 1227.patch.1, SpillableMemory Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe
> (https://github.com/facebook/scribe). It would be something between memory
> and file channel. Input events would be saved directly to the memory (only)
> and would be served from there. In case that the memory would be full, we
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend
> servers that are generating events. We want to send all events to just
> limited number of machines from where we would send the data to HDFS (some
> sort of staging layer). Reason for this second layer is our need to decouple
> event aggregation and front end code to separate machines. Using memory
> channel is fully sufficient as we can survive lost of some portion of the
> events. However in order to sustain maintenance windows or networking issues
> we would have to end up with a lot of memory assigned to those "staging"
> machines. Referenced "scribe" is dealing with this problem by implementing
> following logic - events are saved in memory similarly as our MemoryChannel.
> However in case that the memory gets full (because of maintenance, networking
> issues, ...) it will spill data to disk where they will be sitting until
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's
> durability guarantees would be same as MemoryChannel - in case that someone
> would remove power cord, this channel would lose data. Based on the
> discussion in FLUME-1201, I would propose to have the implementation
> completely independent on any other channel internal code.
> Jarcec
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira