----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9544/ -----------------------------------------------------------
(Updated Feb. 21, 2013, 8:40 p.m.) Review request for Flume. Description ------- Initial patch for capturing feedback on SpillableChannel. Open Issues: a) Config & Overflow Specification : It currently allows configuration (as shown below) for selecting any other channel as a overflow. The current config subsystem doesn't support this mechanism well. To make this happen, i have modified the config subsystem. A postconfig step is introduced to enable a channel to obtain a reference to another configured channel by name. One suggested alternative has been to hard code the file channel as the overflow channel at compile time. Downside would be that if a new SSD optimized channel is introduced in the future, the users of Spillable channel would not be able to take advantage of it. b) Max Transaction Size: This implementation requires a config param maxTransactionBatchSize, which is the size of the largest batchSize setting used by all the sinks and sources attached to this channel. This helps a good deal in optimizing the implementation of put() and take() internally. Consequently, unlike the transactionSize used in other channels, this value should be a much smaller and closer to the actual batchSize settings in the sources/sinks. Hari had mentioned some concerns in this regard which I think had something to do with not all sources being able to know their batchSize upfront. I not totally clear on why it is the case. Would appreciate more insight. Notes: a) Basic algorithm: This implementation avoids copying events between the in-memory queue and the overflow for better performance. at the time of put(), a decision is made to either put the entire transaction in memory or into overflow based on maxTransactionSize and slots available in memory queue. Another queue (called 'drain order queue') is used to memorize the order in which the events were inserted. For example these values in the drain order queue ... +1000,-200,+400,-100 ... indicate the first 1000 elements are in memory, next 200 (the -ve number) are in overflow, next 400 are in memory and so on. This is used at the time of take() to drain elements in the right order directly from memory/overflow and avoid copying back and forth between memory & overflow. Any put/take Transaction will restricted to either the memory or overflow. Allowing it to operate on both will violate correctness of rollback/commit by requiring nested transactions. b) Performance: Currently my measurements are showing decent performance improvements over the MemoryChannel when there is no overflow occurring (between 7% and 70% depending on number of sources/sinks attached). In the case there is overflow, dont have much measurements to provide right now. this will also depend on what we do with point a) # Name the components on this agent agent1.sources = source1 agent1.sinks = sink1 agent1.channels = spillChannel1 fChannel agent1.sources.source1.type = SEQ agent1.sources.source1.batchSize = 10 agent1.sources.source1.channels = spillChannel1 agent1.sinks.sink1.type = null agent1.sinks.sink1.channel = spillChannel1 agent1.sinks.sink1.batchSize = 10 agent1.channels.spillChannel1.type = org.apache.flume.channel.SpillableMemoryChannel agent1.channels.spillChannel1.maxTransactionBatchSize = 10 agent1.channels.spillChannel1.overflowChannel = fChannel agent1.channels.spillChannel1.memoryCapacity = 100 # memory only agent1.channels.spillChannel1.totalCapacity = 10000 # memory + overflow agent1.channels.spillChannel1.keep-alive = 3 agent1.channels.fChannel.type = file agent1.channels.fChannel.checkpointDir = /tmp/fchannel/checkpoint agent1.channels.fChannel.dataDirs = /tmp/fchannel/data agent1.channels.fChannel.keep-alive = 0 agent1.channels.fChannel.capacity = 10000 This addresses bug https://issues.apache.org/jira/browse/FLUME-1227. https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/FLUME-1227 Diffs ----- flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java d98209b flume-ng-channels/flume-spillable-memory-channel/pom.xml PRE-CREATION flume-ng-channels/flume-spillable-memory-channel/src/main/java/org/apache/flume/channel/SpillableMemoryChannel.java PRE-CREATION flume-ng-channels/flume-spillable-memory-channel/src/test/java/org/apache/flume/channel/TestSpillableMemoryChannel.java PRE-CREATION flume-ng-channels/pom.xml 5c6fa76 flume-ng-configuration/src/main/java/org/apache/flume/conf/channel/ChannelConfiguration.java 1e1a46f flume-ng-configuration/src/main/java/org/apache/flume/conf/channel/ChannelType.java 475341d flume-ng-core/src/main/java/org/apache/flume/ChannelFullException.java PRE-CREATION flume-ng-core/src/main/java/org/apache/flume/channel/AbstractChannel.java 1370e66 flume-ng-core/src/main/java/org/apache/flume/channel/MemoryChannel.java 688323d flume-ng-embedded-agent/src/main/java/org/apache/flume/agent/embedded/EmbeddedAgentConfiguration.java 6204bc5 flume-ng-node/pom.xml 035ae06 flume-ng-node/src/main/java/org/apache/flume/node/AbstractConfigurationProvider.java e63c601 pom.xml 36f989d Diff: https://reviews.apache.org/r/9544/diff/ Testing ------- Tests have been included in this patch. Thanks, Roshan Naik
