[jira] [Commented] (FLINK-8186) AvroInputFormat regression: fails to deserialize GenericRecords on standalone cluster with hadoop27 compat

ASF GitHub Bot (JIRA) Mon, 04 Dec 2017 07:35:17 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276952#comment-16276952
 ]


ASF GitHub Bot commented on FLINK-8186:
---------------------------------------

Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/5120
  
    Looks good to me. This is actually two fixes in one:
    
      1. Avro should not be in `flink-dist`, because starting with 
`org.apache.flink`, it would always be loaded *"parent first"*. That causes 
problems because Avro classes exist multiple times, both in the parent and in 
the child classloader, failing equality and 'instanceof' comparisons.
    
      2. The Avro Utils were never using any code in the user code jar, only 
code in the classpath.
    
    This is good for now, but it shows that the reveres classloading has some 
subtle implication as soon as flink dependencies occur both in the user code 
jar and in `/lib`. 


> AvroInputFormat regression: fails to deserialize GenericRecords on standalone 
> cluster with hadoop27 compat
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-8186
>                 URL: https://issues.apache.org/jira/browse/FLINK-8186
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Sebastian Klemke
>            Assignee: Aljoscha Krettek
>            Priority: Blocker
>             Fix For: 1.4.0
>
>         Attachments: GenericRecordCount.java, pom.xml
>
>
> The following job runs fine on a Flink 1.3.2 cluster, but fails on a Flink 
> 1.4.0 RC2 standalone cluster, "hadoop27" flavour:
> {code}
> public class GenericRecordCount {
>     public static void main(String[] args) throws Exception {
>         String input = ParameterTool.fromArgs(args).getRequired("input");
>         ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
>         long count = env.readFile(new AvroInputFormat<>(new Path(input), 
> GenericRecord.class), input)
>                 .count();
>         System.out.printf("Counted %d records\n", count);
>     }
> }
> {code}
> Runs fine in LocalExecutionEnvironment and also on no-hadoop flavour 
> standalone cluster, though. Exception thrown in Flink 1.4.0 hadoop27:
> {code}
> 12/01/2017 13:22:09     DataSource (at 
> readFile(ExecutionEnvironment.java:514) 
> (org.apache.flink.formats.avro.AvroInputFormat))(4/4) switched to FAILED
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.avro.generic.GenericRecord.<init>()
>         at 
> org.apache.avro.specific.SpecificData.newInstance(SpecificData.java:353)
>         at 
> org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:369)
>         at org.apache.avro.reflect.ReflectData.newRecord(ReflectData.java:901)
>         at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:212)
>         at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>         at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
>         at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
>         at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>         at 
> org.apache.flink.formats.avro.AvroInputFormat.nextRecord(AvroInputFormat.java:165)
>         at 
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:167)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.avro.generic.GenericRecord.<init>()
>         at java.lang.Class.getConstructor0(Class.java:3082)
>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>         at 
> org.apache.avro.specific.SpecificData.newInstance(SpecificData.java:347)
>         ... 11 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8186) AvroInputFormat regression: fails to deserialize GenericRecords on standalone cluster with hadoop27 compat

Reply via email to