[ 
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309279#comment-14309279
 ] 

Francis commented on FLUME-2126:
--------------------------------

I don't understand how this will fix the bug. By calling tmp.string(), the 
field will be added as a string. The JsonXContentGenerator will then call 
jackson.JsonGenerator.writeString and the string will become invalid Json, as 
it will be escaped. The jackson.JsonGenerator.writeString documentation is 
clear:

"Method for outputting a String value. Depending on context this means either 
array element, (object) field value or a stand alone String; but in all cases, 
String will be surrounded in double quotes, and contents will be properly 
escaped as required by Json specification."

I tried the patch and this is exactly what I get. For example, when the body of 
an event is {"foo":"bar"}, the resulting document in ES will contain 
"{\"foo\":\"bar\"}". ES view the field as plain text and not Json.

I really don't understand what the elasticsearch sink is trying to do. If it 
detects that the field is Json, it will parse it to make sure it's valid Json, 
but it will then be added as plain text. That's almost the same as if all 
fields were added by using the addSimpleField method, minus the Json 
validation! The original code would have been fine if the ES Java API 
documentation was right. They say: "By the way, the field method accepts many 
object types. You can directly pass numbers, dates and even other 
XContentBuilder objects". But looking at the source code, this is clearly 
wrong, there's no field method accepting an XContentBuilder as value. To get 
around this issue, I think the sink should call rawField when detecting a field 
as Json. This will ensure that the string won't be escaped and will be treated 
as a Json field by ES.

Does it make sense or I'm missing something here?

> Problem in elasticsearch sink when the event body is a complex field
> --------------------------------------------------------------------
>
>                 Key: FLUME-2126
>                 URL: https://issues.apache.org/jira/browse/FLUME-2126
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>         Environment: 1.3.1 and 1.4
>            Reporter: Massimo Paladin
>            Assignee: Ashish Paliwal
>         Attachments: FLUME-2126-0.patch
>
>
> I have found a bug in the elasticsearch sink, the problem is in the 
> {{ContentBuilderUtil.addComplexField}} method, when it does 
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} 
> with the result of being serialized with the {{toString}} method in the 
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it 
> has to parse the content twice, I guess there is a better way to solve the 
> problem but I am not an elasticsearch api expert. 
> {code}
> --- 
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++ 
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
>        parser = XContentFactory.xContent(contentType).createParser(data);
>        parser.nextToken();
>        tmp.copyCurrentStructure(parser);
> -      builder.field(fieldName, tmp);
> +
> +      // if it is a valid structure then we include it
> +      parser = XContentFactory.xContent(contentType).createParser(data);
> +      parser.nextToken();
> +      builder.field(fieldName);
> +      builder.copyCurrentStructure(parser);
>      } catch (JsonParseException ex) {
>        // If we get an exception here the most likely cause is nested JSON 
> that
>        // can't be figured out in the body. At this point just push it through
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to