My guess is that GSON adds the said field in its result. The base64 suggests
that there's some binary data in the mix.
By the way, can you show up more of your code - any reason why you create the JSON yourself rather than just pass
logEntryMap to Es-Hadoop?
It can create the json for you - which is what I recommend; unless you have the JSON in HDFS, it's best to rely on
es-hadoop to do it instead of an external tool.
Cheers,
On 3/20/14 4:48 PM, Brian Stempin wrote:
Hi,
All I'm doing is building a map and passing that to Gson for serialization. A
snippet from my map method:
logEntryMap.put("cs(User-Agent)", values[9]);
context.write(NullWritable.get(), new Text(gson.toJson(logEntryMap)));
values[] is a String array. Everything that goes into the map that gets
serialized is a string.
I do have es.input.json set to true. This failure doesn't occur until
>100,000,000 records are in the index, so its
happening late in the load process. The part that I find strange is that the
field in question isn't in my mapping, and
I've not touched the default mapping. I'm not sure why it would try to parse
it as anything other than a string.
I'll turn on TRACE logging and see what happens.
Brian
On Wed, Mar 19, 2014 at 5:35 PM, Costin Leau <[email protected]
<mailto:[email protected]>> wrote:
Hi,
How do you pass the json to es-hadoop? Do you have an example? By the way,
you can enable TRACE logging on
org.elasticsearch.hadoop and see everything that es-hadoop does, including
the data that goes over the wire.
My guess is that the conversion of logs to JSON creates some extra
artifacts which are later on interpreted as
Writable object (instead of raw JSON) by ES Hadoop.
Make sure you tell es-hadoop that its source it's json (through
es.input.json set to true).
The logs will likely confirm (or not) the above :)
Cheers,
On 3/19/14 11:14 PM, Brian Stempin wrote:
Hi List,
I have an ES cluster that takes in some data from our logs. We use
Hadoop to parse the individual log entries
into JSON
strings, which does a bulk insert using ES's output format. For
whatever reason, ES attempts to parse base64
strings as
a dates and fails. Here's a line from one of my Hadoop logs:
java.lang.__IllegalStateException: Found unrecoverable error [Bad
Request(400) -
MapperParsingException[failed to parse [csUriParams.d]]; nested:
MapperParsingException[failed to parse date
field [REDACTED BASE64 STRING], tried both date format
[dateOptionalTime], and timestamp number with locale []];
nested: IllegalArgumentException[__Invalid format:
"__Y2lkPURFJml0ZW1zPWE2NTJjLXgxZT__Fj..."]; ]; Bailing out..
at
org.elasticsearch.hadoop.rest.__RestClient.retryFailedEntries(__RestClient.java:145)
at
org.elasticsearch.hadoop.rest.__RestClient.bulk(RestClient.__java:120)
at
org.elasticsearch.hadoop.rest.__RestRepository.sendBatch(__RestRepository.java:147)
<SNIP>
csUriParams.d does not appear in my mapping, so I never explicitly
asked for it to be treated as a date.
Any idea why ES is trying to treat it as a date?
Thanks,
Brian
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearch+unsubscribe@__googlegroups.com
<mailto:elasticsearch%[email protected]>
<mailto:[email protected]
<mailto:elasticsearch%[email protected]>>.
To view this discussion on the web visit
https://groups.google.com/d/__msgid/elasticsearch/49e5fe0b-__cec3-4914-b8d6-99440dd5fb69%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/49e5fe0b-cec3-4914-b8d6-99440dd5fb69%40googlegroups.com>
<https://groups.google.com/d/__msgid/elasticsearch/49e5fe0b-__cec3-4914-b8d6-99440dd5fb69%__40googlegroups.com?utm_medium=__email&utm_source=footer
<https://groups.google.com/d/msgid/elasticsearch/49e5fe0b-cec3-4914-b8d6-99440dd5fb69%40googlegroups.com?utm_medium=email&utm_source=footer>>.
For more options, visit https://groups.google.com/d/__optout
<https://groups.google.com/d/optout>.
--
Costin
--
You received this message because you are subscribed to a topic in the Google Groups
"elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/__topic/elasticsearch/___iE0t92CUzA/unsubscribe
<https://groups.google.com/d/topic/elasticsearch/_iE0t92CUzA/unsubscribe>.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@__googlegroups.com
<mailto:elasticsearch%[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/__msgid/elasticsearch/532A0D9C.__7010401%40gmail.com
<https://groups.google.com/d/msgid/elasticsearch/532A0D9C.7010401%40gmail.com>.
For more options, visit https://groups.google.com/d/__optout
<https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANB1ciCdBYj_68DCxEcDxfYucuyhJ7NPWrmEWtV2CypqGp0dSA%40mail.gmail.com
<https://groups.google.com/d/msgid/elasticsearch/CANB1ciCdBYj_68DCxEcDxfYucuyhJ7NPWrmEWtV2CypqGp0dSA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
Costin
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/532B06B1.9010206%40gmail.com.
For more options, visit https://groups.google.com/d/optout.