Hari created FLUME-2718:
---------------------------
Summary: HTTP Source to support generic Stream Handler
Key: FLUME-2718
URL: https://issues.apache.org/jira/browse/FLUME-2718
Project: Flume
Issue Type: Improvement
Components: Sinks+Sources
Reporter: Hari
Currently, the HTTP Source supports JSONHandler as default implementation.
Instead of having a BLOBHandler which accepts any request inputstream which
loads the stream as Event payload will be more generic. And further, this
Handler lets you define mandatory request parameters and maps those parameters
into Event Headers.
By this way HTTPSource can be used as a generic Data Ingress endpoint for any
sink, where one can specify attributes run like basepath, filename & timestamp
as request parameters and access those values via HEADER values in sink
properties.
All this can be done without developing any custom Handler code.
For e.g.
With the below agent configuration, you can send any type of data
(JSON/CSV/TSV) and store it any sink, HDFS in this case.
Curl command --
curl -v -X POST
"http://testHost:8080/?basepath=/data/&filename=test.json×tamp=1434101498275"
--data @test.json
Data created in HDFS
/data/2015/06/12/test.json.1434101498275.lzo
# Agent configuration
# HTTP Source configuration
agent.sources = httpSrc
agent.channels = memChannel
agent.sources.httpSrc.type = http
agent.sources.httpSrc.channels = memChannel
agent.sources.httpSrc.bind = testHost
agent.sources.httpSrc.port = 8080
agent.sources.httpSrc.handler = org.apache.flume.source.http.BLOBHandler
agent.sources.httpSrc.handler.mandatoryParameters = basepath, filename
# Memory channel with default configuration
agent.channels.memChannel.type = memory
agent.channels.memChannel.capacity = 100000
agent.channels.memChannel.transactionCapacity = 1000
# HDFS Sink configuration
agent.sinks.hdfsSink.type = hdfs
agent.sinks.hdfsSink.hdfs.path = %{basepath}/%Y/%m/%d
agent.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
agent.sinks.hdfsSink.hdfs.filePrefix = %{filename}
agent.sinks.hdfsSink.hdfs.fileType = CompressedStream
agent.sinks.hdfsSink.hdfs.codeC = lzop
agent.sinks.hdfsSink.channel = memChannel
# Finally, activate.
agent.channels = memChannel
agent.sources = httpSrc
agent.sinks = hdfsSink
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)