[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782068#comment-15782068
 ] 

Laxman commented on FLUME-2942:
-------------------------------

Answered related query in user mailing list. May be useful for further 
discussion here.

{noformat}
In one-liner, FlumeEventAvroEventSerializer and AvroEventDeserializer are not 
in sync and they can't be used as a serde pair.

Flume's built-in avro serializer FlumeEventAvroEventSerializer which serializes 
Flume events with shell. It's important to note that, here actual raw event is 
wrapped inside the flume shell object and this raw object is treated as binary 
(which can be thrift, avro, or just a byte array, etc). 
Flume's built-in avro deserializer AvroEventDeserializer which deserializes any 
generic event serialized and it wraps the deserialized event into another flume 
shell object.

This means as per your configuration, spool directory source 
(persistence-dev-source) will get an double wrapped flume event (FlumeEvent -> 
FlumeEvent -> raw event body)

To solve this problem, we need to have serializer and deserializer to be in 
sync. We can achieve it in either of the following approaches.
- Use a custom FluemEventAvroEventDeserializer to extract directly FlumeEvent 
without double wrapper and use it with spool directory source.

Similar attempt has already been made by Sebastian here.
https://issues.apache.org/jira/browse/FLUME-2942

I personally recommend to write a FlumeEventAvroEventDeserializer than to 
modify the existing one.

- Use a custom AvroEventSerializer to directly serialize the avro event and use 
it with file_roll sink.
Reference implementation is available in hdfs sink 
(org.apache.flume.sink.hdfs.AvroEventSerializer)
You may strip of hdfs dependencies from it achieve what you want.
{noformat}

> AvroEventDeserializer ignores header from spool source
> ------------------------------------------------------
>
>                 Key: FLUME-2942
>                 URL: https://issues.apache.org/jira/browse/FLUME-2942
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.6.0
>            Reporter: Sebastian Alfers
>         Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to