[
https://issues.apache.org/jira/browse/FLUME-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504462#comment-14504462
]
ASF subversion and git services commented on FLUME-2649:
--------------------------------------------------------
Commit d2fc881f549568ea640fa29a96b0b37da64b225d in flume's branch
refs/heads/flume-1.7 from [~hshreedharan]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=d2fc881 ]
FLUME-2649. Elasticsearch sink doesn't handle JSON fields correctly
(Benjamin Fiorini via Hari)
> Elasticsearch sink doesn't handle JSON fields correctly
> -------------------------------------------------------
>
> Key: FLUME-2649
> URL: https://issues.apache.org/jira/browse/FLUME-2649
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Reporter: Francis
> Assignee: Benjamin Fiorini
> Attachments: FLUME-2649-0.patch, FLUME-2649-1.patch,
> FLUME-2649-2.patch, FLUME-2649-3.patch, FLUME-2649-4.patch,
> FLUME-2649-5.patch, FLUME-2649-6.patch
>
>
> JSON attributes are treated like normal strings and are escaped by the sink.
> For example, if the body or a header contains the following value:
> {code:javascript}
> {"foo":"bar"}
> {code}
> It will be added like this in Elasticsearch:
> {code:javascript}
> {"@message": "{\"foo\":\"bar\"}}"
> {code}
> We end up with a plain string instead of a valid JSON field.
> I think I found how to fix this bug. The source of the problem is caused by
> the way a "complex field" is added. The ES XContent classes are used to parse
> the data in the detected format, but then, instead of adding the parsed data,
> the string() method is called and it converts it back to a string that is the
> same as the initial data! Here is the current code with added comments:
> {code}
> XContentBuilder tmp = jsonBuilder(); // This tmp builder is completely
> useless.
> parser = XContentFactory.xContent(contentType).createParser(data);
> parser.nextToken();
> tmp.copyCurrentStructure(parser); // This copies the whole parsed data in
> this tmp builder.
> // Here, by calling tmp.string(), we get the parsed data converted back to a
> string.
> // This means that tmp.string() == String(data)!
> // All this parsing for nothing...
> // And then, as the field(String, String) method is called on the builder,
> and the builder being a jsonBuilder,
> // the string will be escaped according to the JSON specifications.
> builder.field(fieldName, tmp.string());
> {code}
> If we really want to take advantage of the XContent classes, we have to add
> the parsed data to the builder. To do this, it is as simply as:
> {code}
> parser = XContentFactory.xContent(contentType).createParser(data);
> parser.nextToken();
> // Add the field name, but not the value.
> builder.field(fieldName);
> // This will add the whole parsed content as the value of the field.
> builder.copyCurrentStructure(parser);
> {code}
> I tried this and it works as expected.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)