[
https://issues.apache.org/jira/browse/FLUME-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804434#comment-13804434
]
Dib Ghosh commented on FLUME-2220:
----------------------------------
Hi Rotem,
This issue is due to the v0 logstash json schema used by Flume. Internally
Flume's ElasticSearchSink adds @source and @source_host to mimic v0 logstash
format. This should be resolved with migration to v1 json schema of Logstash.
There is an open bug request on Flume for this one
(https://issues.apache.org/jira/browse/FLUME-2099) and logstash documentation
about the v0 schema problem here -
https://logstash.jira.com/browse/LOGSTASH-675.
To quote the issue from logstash bug list -
"The current logstash json schema has a few problems:
It uses two namespacing techniques when only one is needed ("@" prefixing, like
"@source", and "@fields" object for another namespace)
@source_host and @source_path duplicate @source."
I am also linking your ticket to Flume-2099.
Hope this helps,
- Dib
> ElasticSearch sink - duplicate fields in indexed document
> ---------------------------------------------------------
>
> Key: FLUME-2220
> URL: https://issues.apache.org/jira/browse/FLUME-2220
> Project: Flume
> Issue Type: Bug
> Affects Versions: v1.4.0
> Reporter: Rotem Hermon
> Priority: Minor
> Labels: ElasticSearch, sink
>
> The default serializer for the ElasticSearch sink
> (ElasticSearchLogStashEventSerializer) duplicates fields that are mapped to
> default logstash fields.
> For instance timestamp, source, host. Those appear both as logstash fields
> ("@timestamp", "@source_host" etc.), and both as fields under the @fields
> ("@fields.timestamp", "@fields.host").
> When inserting a field from the headers as a logstash system field it should
> be removed from the dictionary so it wouldn't get written again under the
> "@fields" field.
--
This message was sent by Atlassian JIRA
(v6.1#6144)