[jira] [Commented] (AVRO-2787) Hadoop Mapreduce job fails when creating Writer

Christophe Le Saec (Jira) Thu, 16 Jun 2022 01:11:33 -0700


    [ 
https://issues.apache.org/jira/browse/AVRO-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554953#comment-17554953
 ]


Christophe Le Saec commented on AVRO-2787:
------------------------------------------

As development is done with "Avro version 1.9.2" and  job executed with "With 
Avro 1.8.2", it can happen, especially as the constructors of this class are 
not same between 2 versions (some were removed, other added).

This can lead to NoSuchMethodError error without implying an error in Avro. Can 
you have a try after aligning Avro versions (compile & run) ?

> Hadoop Mapreduce job fails when creating Writer
> -----------------------------------------------
>
>                 Key: AVRO-2787
>                 URL: https://issues.apache.org/jira/browse/AVRO-2787
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.2
>         Environment: Development
>  * OS: Fedora 31
>  * Java version 8
>  * Gradle version 6.2.2
>  * Avro version 1.9.2
>  * Shadow version 5.2.0
>  * Gradle-avro-plugin version 0.19.1
> Running in a Podman container
>  * OS: Ubuntu 18.04
>  * Podman 1.8.2
>  * Hadoop version 3.2.1
>  * Java version 8
>            Reporter: Anton Oellerer
>            Priority: Blocker
>         Attachments: CategoryData.avsc, CategoryTokensReducer.java, 
> TextprocessingfundamentalsApplication.java
>
>
> Hey,
> I am trying to create a Hadoop pipeline getting the chi squared value in for 
> tokens in reviews saved in JSON.
> For this, I created multiple Hadoop jobs, and the communication between them 
> happens, partly, with Avro Data containers.
> When trying to run this pipeline, I get the following error at the end of the 
> first reduce Job (Signature
> {code:java}
> public class CategoryTokensReducer extends Reducer<Text, StringArrayWritable, 
> AvroKey<CharSequence>, AvroValue<CategoryData>>{code}
> )
> Error:
> {code:java}
> java.lang.Exception: java.lang.NoSuchMethodError: 
> org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) 
>                               
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)      
>                               
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
>         at 
> org.apache.avro.hadoop.io.AvroKeyValue.getSchema(AvroKeyValue.java:111)       
>              
>         at 
> org.apache.avro.mapreduce.AvroKeyValueRecordWriter.<init>(AvroKeyValueRecordWriter.java:84)
>          
>         at 
> org.apache.avro.mapreduce.AvroKeyValueOutputFormat.getRecordWriter(AvroKeyValueOutputFormat.java:70)
>         at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
>         at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)       
>                         
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)           
>             
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)                              
>                                          
> {code}
> The Job is setup like this:
> {code:java}
> Job jsonToCategoryTokensJob = Job.getInstance(conf, "json to category data");
> AvroJob.setOutputKeySchema(jsonToCategoryTokensJob, 
> Schema.create(Schema.Type.STRING));
> AvroJob.setOutputValueSchema(jsonToCategoryTokensJob, 
> CategoryData.getClassSchema());
> jsonToCategoryTokensJob.setJarByClass(TextprocessingfundamentalsApplication.class);
> jsonToCategoryTokensJob.setMapperClass(JsonToCategoryTokensMapper.class);
> jsonToCategoryTokensJob.setMapOutputKeyClass(Text.class);
> jsonToCategoryTokensJob.setMapOutputValueClass(StringArrayWritable.class);
> jsonToCategoryTokensJob.setReducerClass(CategoryTokensReducer.class);
> jsonToCategoryTokensJob.setOutputFormatClass(AvroKeyValueOutputFormat.class);
> String in = otherArgs.get(0);
> String out = otherArgs.get(1);
> FileInputFormat.addInputPath(jsonToCategoryTokensJob, new Path(in));
> FileOutputFormat.setOutputPath(jsonToCategoryTokensJob, new Path(out, 
> "outCategoryData"));
> {code}
> The pipeline is run by first building a shadowJar from the source in the 
> development environment and then running it in a podman container.
> With Avro 1.8.2 and gradle plugin 0.16.0 the reduce job works. 
> Does someone know what the problem here might be?
> Best regards
> Anton



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (AVRO-2787) Hadoop Mapreduce job fails when creating Writer

Reply via email to