[jira] [Reopened] (SPARK-19424) Wrong runtime type in RDD when reading from avro with custom serializer

Nira Amit (JIRA) Tue, 14 Mar 2017 09:29:00 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-19424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nira Amit reopened SPARK-19424:
-------------------------------

I managed to get my code to work, yes. The problem of the wrong type at runtime 
is not resolved, though. Your API ignores an "Unchecked cast" warning when 
assigning the returned value from the avro lib to the datum field of my custom 
AvroKey, I'm assuming. This is a problem as it violates the type-safety 
guarantees expected from Java APIs. My code shouldn't break unexpectedly with a 
CastClassException that I did not explicitly cause. The API should throw the 
exception instead of performing the erroneous assignment.

> Wrong runtime type in RDD when reading from avro with custom serializer
> -----------------------------------------------------------------------
>
>                 Key: SPARK-19424
>                 URL: https://issues.apache.org/jira/browse/SPARK-19424
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.0.2
>         Environment: Ubuntu, spark 2.0.2 prebuilt for hadoop 2.7
>            Reporter: Nira Amit
>
> I am trying to read data from avro files into an RDD using Kryo. My code 
> compiles fine, but in runtime I'm getting a ClassCastException. Here is what 
> my code does:
> {code}
> SparkConf conf = new SparkConf()...
> conf.set("spark.serializer", KryoSerializer.class.getCanonicalName());
> conf.set("spark.kryo.registrator", MyKryoRegistrator.class.getName());
> JavaSparkContext sc = new JavaSparkContext(conf);
> {code}
> Where MyKryoRegistrator registers a Serializer for MyCustomClass:
> {code}
> public void registerClasses(Kryo kryo) {
>     kryo.register(MyCustomClass.class, new MyCustomClassSerializer());
> }
> {code}
> Then, I read my datafile:
> {code}
> JavaPairRDD<MyCustomClass, NullWritable> records =
>                 sc.newAPIHadoopFile("file:/path/to/datafile.avro",
>                 AvroKeyInputFormat.class, MyCustomClass.class, 
> NullWritable.class,
>                 sc.hadoopConfiguration());
> Tuple2<MyCustomClass, NullWritable> first = records.first();
> {code}
> This seems to work fine, but using a debugger I can see that while the RDD 
> has a kClassTag of my.package.containing.MyCustomClass, the variable first 
> contains a Tuple2<AvroKey, NullWritable>, not Tuple2<MyCustomClass, 
> NullWritable>! And indeed, when the following line executes:
> {code}
> System.out.println("Got a result, custom field is: " + 
> first._1.getSomeCustomField());
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast 
> to my.package.containing.MyCustomClass
> {code}
> Am I doing something wrong? And even so, shouldn't I get a compilation error 
> rather than a runtime error?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Reopened] (SPARK-19424) Wrong runtime type in RDD when reading from avro with custom serializer

Reply via email to