Francis created FLUME-2649:
------------------------------

             Summary: Elasticsearch sink doesn't handle JSON fields correctly
                 Key: FLUME-2649
                 URL: https://issues.apache.org/jira/browse/FLUME-2649
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
            Reporter: Francis


JSON attributes are treated like normal strings and are escaped by the sink. 
For example, if the body or a header contains the following value:
{code:javascript}
{"foo":"bar"}
{code}
It will be added like this in Elasticsearch:
{code:javascript}
{"@message": "{\"foo\":\"bar\"}}"
{code}
We end up with a plain string instead of a valid JSON field.

I think I found how to fix this bug. The source of the problem is caused by the 
way a "complex field" is added. The ES XContent classes are used to parse the 
data in the detected format, but then, instead of adding the parsed data, the 
string() method is called and it converts it back to a string that is the same 
as the initial data! Here is the current code with added comments:
{code}
XContentBuilder tmp = jsonBuilder(); // This tmp builder is completely useless.
parser = XContentFactory.xContent(contentType).createParser(data);
parser.nextToken();
tmp.copyCurrentStructure(parser); // This copies the whole parsed data in this 
tmp builder.
// Here, by calling tmp.string(), we get the parsed data converted back to a 
string.
// This means that tmp.string() == String(data)!
// All this parsing for nothing...
// And then, as the field(String, String) method is called on the builder, and 
the builder being a jsonBuilder,
// the string will be escaped according to the JSON specifications. 
builder.field(fieldName, tmp.string());
{code}
If we really want to take advantage of the XContent classes, we have to add the 
parsed data to the builder. To do this, it is as simply as:
{code}
parser = XContentFactory.xContent(contentType).createParser(data);
parser.nextToken();
// Add the field name, but not the value.
builder.field(fieldName);
// This will add the whole parsed content as the value of the field.
builder.copyCurrentStructure(parser);
{code}
I tried this and it works as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to