[
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309279#comment-14309279
]
Francis commented on FLUME-2126:
--------------------------------
I don't understand how this will fix the bug. By calling tmp.string(), the
field will be added as a string. The JsonXContentGenerator will then call
jackson.JsonGenerator.writeString and the string will become invalid Json, as
it will be escaped. The jackson.JsonGenerator.writeString documentation is
clear:
"Method for outputting a String value. Depending on context this means either
array element, (object) field value or a stand alone String; but in all cases,
String will be surrounded in double quotes, and contents will be properly
escaped as required by Json specification."
I tried the patch and this is exactly what I get. For example, when the body of
an event is {"foo":"bar"}, the resulting document in ES will contain
"{\"foo\":\"bar\"}". ES view the field as plain text and not Json.
I really don't understand what the elasticsearch sink is trying to do. If it
detects that the field is Json, it will parse it to make sure it's valid Json,
but it will then be added as plain text. That's almost the same as if all
fields were added by using the addSimpleField method, minus the Json
validation! The original code would have been fine if the ES Java API
documentation was right. They say: "By the way, the field method accepts many
object types. You can directly pass numbers, dates and even other
XContentBuilder objects". But looking at the source code, this is clearly
wrong, there's no field method accepting an XContentBuilder as value. To get
around this issue, I think the sink should call rawField when detecting a field
as Json. This will ensure that the string won't be escaped and will be treated
as a Json field by ES.
Does it make sense or I'm missing something here?
> Problem in elasticsearch sink when the event body is a complex field
> --------------------------------------------------------------------
>
> Key: FLUME-2126
> URL: https://issues.apache.org/jira/browse/FLUME-2126
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Environment: 1.3.1 and 1.4
> Reporter: Massimo Paladin
> Assignee: Ashish Paliwal
> Attachments: FLUME-2126-0.patch
>
>
> I have found a bug in the elasticsearch sink, the problem is in the
> {{ContentBuilderUtil.addComplexField}} method, when it does
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}}
> with the result of being serialized with the {{toString}} method in the
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it
> has to parse the content twice, I guess there is a better way to solve the
> problem but I am not an elasticsearch api expert.
> {code}
> ---
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
> parser = XContentFactory.xContent(contentType).createParser(data);
> parser.nextToken();
> tmp.copyCurrentStructure(parser);
> - builder.field(fieldName, tmp);
> +
> + // if it is a valid structure then we include it
> + parser = XContentFactory.xContent(contentType).createParser(data);
> + parser.nextToken();
> + builder.field(fieldName);
> + builder.copyCurrentStructure(parser);
> } catch (JsonParseException ex) {
> // If we get an exception here the most likely cause is nested JSON
> that
> // can't be figured out in the body. At this point just push it through
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)