-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9544/
-----------------------------------------------------------

Review request for Flume.


Description
-------

Initial patch for capturing feedback on SpillableChannel.

Open Issues:  
a) Config & Overflow Specification : 
It currently allows configuration (as shown below)  for selecting any other 
channel as a overflow. The current config subsystem doesn't support this 
mechanism well. To make this happen, i have modified the config subsystem. A 
postconfig step is introduced to enable a channel to obtain a reference to 
another configured channel by name. 
 One suggested alternative has been to hard code the file channel as the 
overflow channel at compile time. Downside would be that if a new SSD optimized 
channel is introduced in the future, the users of Spillable channel would not 
be able to take advantage of it.

b) Max Transaction Size: This implementation requires a config param  
maxTransactionBatchSize, which is the size of the largest batchSize setting 
used by all the sinks and sources attached to this channel. This helps a good 
deal in optimizing the implementation of put() and take() internally. 
Consequently, unlike the transactionSize used in other channels, this value 
should be a much smaller and closer to the actual batchSize settings in the 
sources/sinks. Hari had mentioned some concerns in this regard which I think 
had something to do with not all sources being able to know their batchSize 
upfront. I not totally clear on why it is the case. Would appreciate more 
insight. 

Notes:

a) Basic algorithm: This implementation avoids copying events between the 
in-memory queue and the overflow for better performance. at the time of put(), 
a decision is made to either put the entire transaction in memory or into 
overflow based on maxTransactionSize  and slots available in memory queue.  
Another queue (called 'drain order queue') is used to memorize the order in 
which the events were inserted. For example these values in the  drain order 
queue ... +1000,-200,+400,-100 ... indicate the first 1000 elements are in 
memory, next 200 (the -ve number) are in overflow, next 400 are in memory and 
so on. This is used at the time of take() to drain elements in the right order 
directly from memory/overflow and avoid copying back and forth between memory & 
overflow. Any put/take Transaction will restricted to either the memory or 
overflow. Allowing it to operate on both will violate correctness of 
rollback/commit by requiring nested transactions.

b) Performance: Currently my measurements are showing decent performance 
improvements over the MemoryChannel when there is no overflow occurring 
(between 7% and 70% depending on number of sources/sinks attached).  In the 
case there is overflow, dont have much measurements to provide right now. this 
will also depend on what we do with point a)


# Name the components on this agent
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = spillChannel1  fChannel

agent1.sources.source1.type = SEQ
agent1.sources.source1.batchSize = 10

agent1.sources.source1.channels = spillChannel1

agent1.sinks.sink1.type = null
agent1.sinks.sink1.channel = spillChannel1
agent1.sinks.sink1.batchSize = 10

agent1.channels.spillChannel1.type = 
org.apache.flume.channel.SpillableMemoryChannel
agent1.channels.spillChannel1.maxTransactionBatchSize = 10
agent1.channels.spillChannel1.overflowChannel = fChannel
agent1.channels.spillChannel1.memoryCapacity = 100   # memory only      
agent1.channels.spillChannel1.totalCapacity = 10000    # memory + overflow
agent1.channels.spillChannel1.keep-alive = 3


agent1.channels.fChannel.type = file
agent1.channels.fChannel.checkpointDir = /tmp/fchannel/checkpoint
agent1.channels.fChannel.dataDirs = /tmp/fchannel/data
agent1.channels.fChannel.keep-alive = 0
agent1.channels.fChannel.capacity = 10000


This addresses bug https://issues.apache.org/jira/browse/FLUME-1227.
    
https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/FLUME-1227


Diffs
-----

  
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
 d98209b 
  flume-ng-channels/flume-spillable-memory-channel/pom.xml PRE-CREATION 
  
flume-ng-channels/flume-spillable-memory-channel/src/main/java/org/apache/flume/channel/SpillableMemoryChannel.java
 PRE-CREATION 
  
flume-ng-channels/flume-spillable-memory-channel/src/test/java/org/apache/flume/channel/TestSpillableMemoryChannel.java
 PRE-CREATION 
  flume-ng-channels/pom.xml 5c6fa76 
  
flume-ng-configuration/src/main/java/org/apache/flume/conf/channel/ChannelConfiguration.java
 1e1a46f 
  
flume-ng-configuration/src/main/java/org/apache/flume/conf/channel/ChannelType.java
 475341d 
  flume-ng-core/src/main/java/org/apache/flume/ChannelFullException.java 
PRE-CREATION 
  flume-ng-core/src/main/java/org/apache/flume/channel/AbstractChannel.java 
1370e66 
  flume-ng-core/src/main/java/org/apache/flume/channel/MemoryChannel.java 
688323d 
  
flume-ng-embedded-agent/src/main/java/org/apache/flume/agent/embedded/EmbeddedAgentConfiguration.java
 6204bc5 
  flume-ng-node/pom.xml 035ae06 
  
flume-ng-node/src/main/java/org/apache/flume/node/AbstractConfigurationProvider.java
 e63c601 
  pom.xml 36f989d 

Diff: https://reviews.apache.org/r/9544/diff/


Testing
-------

Tests have been included in this patch.


Thanks,

Roshan Naik

Reply via email to