Hari created FLUME-2718:
---------------------------

             Summary: HTTP Source to support generic Stream Handler
                 Key: FLUME-2718
                 URL: https://issues.apache.org/jira/browse/FLUME-2718
             Project: Flume
          Issue Type: Improvement
          Components: Sinks+Sources
            Reporter: Hari


Currently, the HTTP Source supports JSONHandler as default implementation.
Instead of having a BLOBHandler which accepts any request inputstream which 
loads the stream as Event payload will be more generic. And further, this 
Handler lets you define mandatory request parameters and maps those parameters 
into Event Headers. 

By this way HTTPSource can be used as a generic Data Ingress endpoint for any 
sink, where one can specify attributes run like basepath, filename & timestamp 
as request parameters and access those values via HEADER values in sink 
properties.

All this can be done without developing any custom Handler code.

For e.g.

With the below agent configuration, you can send any type of data 
(JSON/CSV/TSV) and store it any sink, HDFS in this case. 

Curl command -- 
curl -v -X POST 
"http://testHost:8080/?basepath=/data/&filename=test.json&timestamp=1434101498275";
 --data @test.json

Data created in HDFS 
/data/2015/06/12/test.json.1434101498275.lzo

# Agent configuration
# HTTP Source configuration
agent.sources = httpSrc
agent.channels = memChannel
agent.sources.httpSrc.type = http
agent.sources.httpSrc.channels = memChannel
agent.sources.httpSrc.bind = testHost
agent.sources.httpSrc.port = 8080
agent.sources.httpSrc.handler = org.apache.flume.source.http.BLOBHandler
agent.sources.httpSrc.handler.mandatoryParameters = basepath, filename

# Memory channel with default configuration
agent.channels.memChannel.type = memory
agent.channels.memChannel.capacity = 100000
agent.channels.memChannel.transactionCapacity = 1000

# HDFS Sink configuration
agent.sinks.hdfsSink.type = hdfs
agent.sinks.hdfsSink.hdfs.path = %{basepath}/%Y/%m/%d
agent.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
agent.sinks.hdfsSink.hdfs.filePrefix = %{filename}
agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
agent.sinks.hdfsSink.hdfs.codeC = lzop
agent.sinks.hdfsSink.channel = memChannel

# Finally, activate.
agent.channels = memChannel
agent.sources = httpSrc
agent.sinks = hdfsSink




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to