If Flume is fed an avro file via the SpoolDir src and then the data is
drained into HDFS using the avro serializer, i am noticing that what ends
up in the hdfs is an avro file with the below schema:
{
"type" : "record",
"name" : "Event",
"fields" : [ {
"name" : "headers",
"type" : {
"type" : "map",
"values" : "string"
}
}, {
"name" : "body",
"type" : "bytes"
} ]
}
Basically each record is not the orignal avro record but instead it the
flume event object (i.e headers+body) and the body part of it has the
original avro record embedded.
if the event header has a schema then the schema ends up in each record of
the output file.
AFAIKT, this is undesirable. What is desired is to kind of recreate the
original file contents.
*My question* is how to configure flume so that the original records
instead of wrapping it in FlumeEvent objects in the destination file.
*details:*
Attached a debugger to the running agent and felt that the spooldir src is
deserializing the avro file correct into a flume event (header with schema,
body with avro datum)..
But on the hdfs sink side the avro serializer is writing out the *whole*
FlumeEvent object... instead of serializing just the event body.
Below is the config i used:
agent.channels = memoryChannel
agent.channels.memoryChannel.type = memory
agent.sources = sd
*agent.sources.sd.type = spooldir*
agent.sources.sd.spoolDir = /tmp/avro
*agent.sources.sd.deserializer = AVRO*
*agent.sources.sd.deserializer.schemaType = LITERAL*
agent.sources.sd.channels = memoryChannel
agent.sinks = hdfsSink
agent.sinks.hdfsSink.type = hdfs
agent.sinks.hdfsSink.channel = memoryChannel
agent.sinks.hdfsSink.hdfs.path = /tmp/flumetest/avro
agent.sinks.hdfsSink.hdfs.fileType = DataStream
agent.sinks.hdfsSink.serializer = avro_event
agent.sinks.hdfsSink.hdfs.callTimeout = 5000
-roshan
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.