[HADOOP] [Spark] Problem with encoding of parentId containing backslash

Neil Andrassy Thu, 29 Jan 2015 09:00:38 -0800

Hi list,

I have an RDD with a field included that contains an ID that I'd like to 
become the parent document when I execute saveToEs (all authored in scala). 
Something like this...


{
    "units_sold": 100,
    "unit_price": 8.99,
    "revenue": 899,
    "parentId": "binlin\\staglow(L28AF)" //i.e. it has a single backlash in 
it
 }

This works fine until my parent id contains the backslash \ character, at 
which point I get an exception. Escaping the backslash (\\) doesn't work 
for me either - the job runs successfully, but the _parent field is set to 
the value with the double \\, so it doesn't reference the intended parent. 
I'd love to remove the slashes from my ids but this is, unfortunately, part 
of a much bigger job :(

Stack trace for the single backlash version below....

15/01/28 17:05:46 WARN TaskSetManager: Lost task 3.3 in stage 0.2 (TID 425, 
SERVER1): org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: 
JsonParseException
[Unrecognized character escape 's' (code 115)
 at [Source: [B@68700ab1; line: 1, column: 44]]; 
fragment[ent":"binlin\staglow]
        
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:322)
        
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:299)
        org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:149)
        
org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.jav
a:199)
        
org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:2
23)
        
org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:2
36)
        
org.elasticsearch.hadoop.rest.RestService$PartitionWriter.close(RestServ
ice.java:125)
        
org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply$mcV$sp(Es
RDDWriter.scala:33)
        
org.apache.spark.TaskContext$$anon$2.onTaskCompletion(TaskContext.scala:
99)
        
org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskCont
ext.scala:107)
        
org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskCont
ext.scala:107)
        
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sca
la:59)
        scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        
org.apache.spark.TaskContext.markTaskCompleted(TaskContext.scala:107)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:64)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        java.lang.Thread.run(Unknown Source)


Many thanks for any advice or workarounds,

Neil A

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/766edd46-5d68-4f6a-abc6-ea21b316ca56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[HADOOP] [Spark] Problem with encoding of parentId containing backslash

Reply via email to