[Hadoop] : Parsing error in MR integration

Aurélien Mon, 14 Jul 2014 03:20:07 -0700

Hi,

I can't sort that ! I'm using hadoop CDH3u6, and trying to get ES index my 
data. I tried with raw json and MapWritable, I always get the same kind of 
errors :



java.lang.Exception: org.elasticsearch.hadoop.
EsHadoopIllegalArgumentException: [org.elasticsearch.hadoop.serialization.
field.MapWritableFieldExtractor@35b5f7bd] cannot extract value from object [
org.apache.hadoop.io.MapWritable@11c757a1]
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:
349)
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: [org.
elasticsearch.hadoop.serialization.field.MapWritableFieldExtractor@35b5f7bd] 
cannot extract value from object [org.apache.hadoop.io.MapWritable@11c757a1]
    at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk$FieldWriter
.write(TemplatedBulk.java:49)
    at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.
writeTemplate(TemplatedBulk.java:101)
    at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(
TemplatedBulk.java:77)
    at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(
RestRepository.java:130)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.write(
EsOutputFormat.java:161)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(
MapTask.java:531)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(
TaskInputOutputContext.java:80)
    at my.jobs.index.IndexMapper.map(IndexMapper.java:27)
    at my.jobs.index.IndexMapper.map(IndexMapper.java:19)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:648)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(
LocalJobRunner.java:218)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
    at java.lang.Thread.run(Thread.java:724)


Seems to me that all is right, here the configuration of the index mapper : 

Job job = new Job(getConf(), "Indexing into Elastic search.");
    job.setJarByClass(getClass());
    DomainRankDriver.loadLibrariesToDistributedCache(job);
    
    Path input = new Path(args[0]);
    FileInputFormat.addInputPath(job, input);
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    // Used by ES-hadoop to take Text as Json
    job.setOutputFormatClass(EsOutputFormat.class);
//    job.setMapOutputValueClass(Text.class);
    job.setMapOutputValueClass(MapWritable.class);
    job.setMapperClass(IndexMapper.class);
    
    job.setNumReduceTasks(0);

And my simple mapper : 

  @Override
  public void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptedException{
    MapWritable map = new MapWritable();
    map.put(new Text("test"), new Text("value"));
    context.write(new LongWritable(), map);
  }

Any clue to search for more ? I'm stuck.

Thanks,
Aurelien

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7f6545ab-d6d9-4fdf-8923-0b60e0ea5297%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[Hadoop] : Parsing error in MR integration

Reply via email to