[jira] [Commented] (FLUME-2220) ElasticSearch sink - duplicate fields in indexed document

Rotem Hermon (JIRA) Thu, 24 Oct 2013 10:51:21 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804456#comment-13804456
 ]


Rotem Hermon commented on FLUME-2220:
-------------------------------------

Hi Dib

v1 will certainly makes the schema less noisy, but this issue is not due to the 
v0 schema, it's just seems like a bug in the sink. The serializer creates a map 
of headers, extracts some fields from this map and sets them as top fields, and 
then goes over all the items in the map and adds them under the "@fields" 
field. So the items that where extracted before and were already added as 
logstash fields are added again also under "@fields". This is redundant. Items 
from the map that where added should be removed from the map before doing the 
generic adding so they won't appear twice. 

Hope I managed to be clear. If I'll get to it I'll try to attach a code fix 
(still trying to understand the procedure of submitting code to an Apache 
project...).

> ElasticSearch sink - duplicate fields in indexed document
> ---------------------------------------------------------
>
>                 Key: FLUME-2220
>                 URL: https://issues.apache.org/jira/browse/FLUME-2220
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.4.0
>            Reporter: Rotem Hermon
>            Priority: Minor
>              Labels: ElasticSearch, sink
>
> The default serializer for the ElasticSearch sink 
> (ElasticSearchLogStashEventSerializer) duplicates fields that are mapped to 
> default logstash fields.
> For instance timestamp, source, host. Those appear both as logstash fields 
> ("@timestamp", "@source_host" etc.), and both as fields under the @fields 
> ("@fields.timestamp", "@fields.host").
> When inserting a field from the headers as a logstash system field it should 
> be removed from the dictionary so it wouldn't get written again under the 
> "@fields" field.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2220) ElasticSearch sink - duplicate fields in indexed document

Reply via email to