Then what you could do is to minimize the bulk size to say 100 documents, turn on logging and run your data through. This way you can catch the 'special' document in the act.
As for expectations - Elasticsearch tries to guess the field type by looking at its value - it seems the base64 entry looks like a date, hence the error. You can avoid this by defining the field (either directly or through a template) in your mapping so it always gets mapped to a string.
As a rule of thumb, whenever you want full control over the index, mapping is the way to do it. On 3/20/14 6:10 PM, Brian Stempin wrote:
I have unit tests for this MR job, and they show that the JSON output is a string as I'd expect, so Gson is most likely not the cause. I'm hesitant to show more code (owned by the work-place), but I can describe it a little bit further: * The mapper gets a W3C log entry * The log entry is broken into its components and put into document X * The request URL is then taken and broken down into its query parameters and the key-value pairs are put into document Y * Some elements are then explicitly filtered from X and Y * Those two documents are placed inside of document Z, which is ultimately what is serialized and sent to ES We do have a base64 encoded param that we expect and handle appropriately. In this case, someone most likely sent it as the wrong param name, hence why its making its way into document Y without further processing. Since its being sent as a name that's not listed in the mapping, I expect it to just be treated as a string. The only reason that I chose to go the Gson route vs building MapWritables is that building MapWritables is terribly verbose. Also, it comes with the overhead of having to wrap each String with a Text type, which just seems silly. Using the built-in JSON serializer is just not convenient in this case. Brian On Thu, Mar 20, 2014 at 11:18 AM, Costin Leau <[email protected] <mailto:[email protected]>> wrote: My guess is that GSON adds the said field in its result. The base64 suggests that there's some binary data in the mix. By the way, can you show up more of your code - any reason why you create the JSON yourself rather than just pass logEntryMap to Es-Hadoop? It can create the json for you - which is what I recommend; unless you have the JSON in HDFS, it's best to rely on es-hadoop to do it instead of an external tool. Cheers, On 3/20/14 4:48 PM, Brian Stempin wrote: Hi, All I'm doing is building a map and passing that to Gson for serialization. A snippet from my map method: logEntryMap.put("cs(User-__Agent)", values[9]); context.write(NullWritable.__get(), new Text(gson.toJson(logEntryMap))__); values[] is a String array. Everything that goes into the map that gets serialized is a string. I do have es.input.json set to true. This failure doesn't occur until >100,000,000 records are in the index, so its happening late in the load process. The part that I find strange is that the field in question isn't in my mapping, and I've not touched the default mapping. I'm not sure why it would try to parse it as anything other than a string. I'll turn on TRACE logging and see what happens. Brian On Wed, Mar 19, 2014 at 5:35 PM, Costin Leau <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>__> wrote: Hi, How do you pass the json to es-hadoop? Do you have an example? By the way, you can enable TRACE logging on org.elasticsearch.hadoop and see everything that es-hadoop does, including the data that goes over the wire. My guess is that the conversion of logs to JSON creates some extra artifacts which are later on interpreted as Writable object (instead of raw JSON) by ES Hadoop. Make sure you tell es-hadoop that its source it's json (through es.input.json set to true). The logs will likely confirm (or not) the above :) Cheers, On 3/19/14 11:14 PM, Brian Stempin wrote: Hi List, I have an ES cluster that takes in some data from our logs. We use Hadoop to parse the individual log entries into JSON strings, which does a bulk insert using ES's output format. For whatever reason, ES attempts to parse base64 strings as a dates and fails. Here's a line from one of my Hadoop logs: java.lang.____IllegalStateException: Found unrecoverable error [Bad Request(400) - MapperParsingException[failed to parse [csUriParams.d]]; nested: MapperParsingException[failed to parse date field [REDACTED BASE64 STRING], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[____Invalid format: "____Y2lkPURFJml0ZW1zPWE2NTJjLXgxZT____Fj..."]; ]; Bailing out.. at org.elasticsearch.hadoop.rest.____RestClient.__retryFailedEntries(____RestClient.java:145) at org.elasticsearch.hadoop.rest.____RestClient.bulk(RestClient.____java:120) at org.elasticsearch.hadoop.rest.____RestRepository.sendBatch(____RestRepository.java:147) <SNIP> csUriParams.d does not appear in my mapping, so I never explicitly asked for it to be treated as a date. Any idea why ES is trying to treat it as a date? Thanks, Brian -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@__go__oglegroups.com <http://googlegroups.com> <mailto:elasticsearch%[email protected] <mailto:elasticsearch%[email protected]>__> <mailto:[email protected] <mailto:elasticsearch%[email protected]> <mailto:elasticsearch%[email protected] <mailto:elasticsearch%[email protected]>__>>. To view this discussion on the web visit https://groups.google.com/d/____msgid/elasticsearch/49e5fe0b-____cec3-4914-b8d6-99440dd5fb69%____40googlegroups.com <https://groups.google.com/d/__msgid/elasticsearch/49e5fe0b-__cec3-4914-b8d6-99440dd5fb69%__40googlegroups.com> <https://groups.google.com/d/__msgid/elasticsearch/49e5fe0b-__cec3-4914-b8d6-99440dd5fb69%__40googlegroups.com <https://groups.google.com/d/msgid/elasticsearch/49e5fe0b-cec3-4914-b8d6-99440dd5fb69%40googlegroups.com>> <https://groups.google.com/d/____msgid/elasticsearch/49e5fe0b-____cec3-4914-b8d6-99440dd5fb69%____40googlegroups.com?utm___medium=__email&utm_source=__footer <https://groups.google.com/d/__msgid/elasticsearch/49e5fe0b-__cec3-4914-b8d6-99440dd5fb69%__40googlegroups.com?utm_medium=__email&utm_source=footer> <https://groups.google.com/d/__msgid/elasticsearch/49e5fe0b-__cec3-4914-b8d6-99440dd5fb69%__40googlegroups.com?utm_medium=__email&utm_source=footer <https://groups.google.com/d/msgid/elasticsearch/49e5fe0b-cec3-4914-b8d6-99440dd5fb69%40googlegroups.com?utm_medium=email&utm_source=footer>>>. For more options, visit https://groups.google.com/d/____optout <https://groups.google.com/d/__optout> <https://groups.google.com/d/__optout <https://groups.google.com/d/optout>>. -- Costin -- You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group. To unsubscribe from this topic, visit https://groups.google.com/d/____topic/elasticsearch/_____iE0t92CUzA/unsubscribe <https://groups.google.com/d/__topic/elasticsearch/___iE0t92CUzA/unsubscribe> <https://groups.google.com/d/__topic/elasticsearch/___iE0t92CUzA/unsubscribe <https://groups.google.com/d/topic/elasticsearch/_iE0t92CUzA/unsubscribe>>. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@__go__oglegroups.com <http://googlegroups.com> <mailto:elasticsearch%[email protected] <mailto:elasticsearch%[email protected]>__>. To view this discussion on the web visit https://groups.google.com/d/____msgid/elasticsearch/532A0D9C.____7010401%40gmail.com <https://groups.google.com/d/__msgid/elasticsearch/532A0D9C.__7010401%40gmail.com> <https://groups.google.com/d/__msgid/elasticsearch/532A0D9C.__7010401%40gmail.com <https://groups.google.com/d/msgid/elasticsearch/532A0D9C.7010401%40gmail.com>>. For more options, visit https://groups.google.com/d/____optout <https://groups.google.com/d/__optout> <https://groups.google.com/d/__optout <https://groups.google.com/d/optout>>. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@__googlegroups.com <mailto:elasticsearch%[email protected]> <mailto:[email protected] <mailto:elasticsearch%[email protected]>>. To view this discussion on the web visit https://groups.google.com/d/__msgid/elasticsearch/__CANB1ciCdBYj___68DCxEcDxfYucuyhJ7NPWrmEWtV2Cy__pqGp0dSA%40mail.gmail.com <https://groups.google.com/d/msgid/elasticsearch/CANB1ciCdBYj_68DCxEcDxfYucuyhJ7NPWrmEWtV2CypqGp0dSA%40mail.gmail.com> <https://groups.google.com/d/__msgid/elasticsearch/__CANB1ciCdBYj___68DCxEcDxfYucuyhJ7NPWrmEWtV2Cy__pqGp0dSA%40mail.gmail.com?utm___medium=email&utm_source=footer <https://groups.google.com/d/msgid/elasticsearch/CANB1ciCdBYj_68DCxEcDxfYucuyhJ7NPWrmEWtV2CypqGp0dSA%40mail.gmail.com?utm_medium=email&utm_source=footer>__>. For more options, visit https://groups.google.com/d/__optout <https://groups.google.com/d/optout>. -- Costin -- You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group. To unsubscribe from this topic, visit https://groups.google.com/d/__topic/elasticsearch/___iE0t92CUzA/unsubscribe <https://groups.google.com/d/topic/elasticsearch/_iE0t92CUzA/unsubscribe>. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@__googlegroups.com <mailto:elasticsearch%[email protected]>. To view this discussion on the web visit https://groups.google.com/d/__msgid/elasticsearch/532B06B1.__9010206%40gmail.com <https://groups.google.com/d/msgid/elasticsearch/532B06B1.9010206%40gmail.com>. For more options, visit https://groups.google.com/d/__optout <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANB1ciBy6jCC8YVT4FPi03g9TgGkt-QhB%2BUQKfWvDioYBnRopQ%40mail.gmail.com <https://groups.google.com/d/msgid/elasticsearch/CANB1ciBy6jCC8YVT4FPi03g9TgGkt-QhB%2BUQKfWvDioYBnRopQ%40mail.gmail.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout.
-- Costin -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/532B16AF.7030701%40gmail.com. For more options, visit https://groups.google.com/d/optout.
