[
https://issues.apache.org/jira/browse/SPARK-19424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nira Amit reopened SPARK-19424:
-------------------------------
Re-opening this issue because throwing unexpected ClassCastExceptions is not an
accepted behavior of a Java API:
"This sort of unexpected ClassCastException is considered a violation of the
type-safety principle" (source:
http://www.angelikalanger.com/GenericsFAQ/FAQSections/ParameterizedTypes.html#FAQ006)
"[The type-safety] principle is very important – we don’t want the implicit
casts added when compiling generic code to raise runtime exceptions, since they
would be hard to understand and fix". (source:
https://eyalsch.wordpress.com/tag/type-safety/).
I created a GitHub repository with a complete test-app that reproduces this
behavior: https://github.com/homosepian/spark-avro-kryo
> Wrong runtime type in RDD when reading from avro with custom serializer
> -----------------------------------------------------------------------
>
> Key: SPARK-19424
> URL: https://issues.apache.org/jira/browse/SPARK-19424
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 2.0.2
> Environment: Ubuntu, spark 2.0.2 prebuilt for hadoop 2.7
> Reporter: Nira Amit
>
> I am trying to read data from avro files into an RDD using Kryo. My code
> compiles fine, but in runtime I'm getting a ClassCastException. Here is what
> my code does:
> {code}
> SparkConf conf = new SparkConf()...
> conf.set("spark.serializer", KryoSerializer.class.getCanonicalName());
> conf.set("spark.kryo.registrator", MyKryoRegistrator.class.getName());
> JavaSparkContext sc = new JavaSparkContext(conf);
> {code}
> Where MyKryoRegistrator registers a Serializer for MyCustomClass:
> {code}
> public void registerClasses(Kryo kryo) {
> kryo.register(MyCustomClass.class, new MyCustomClassSerializer());
> }
> {code}
> Then, I read my datafile:
> {code}
> JavaPairRDD<MyCustomClass, NullWritable> records =
> sc.newAPIHadoopFile("file:/path/to/datafile.avro",
> AvroKeyInputFormat.class, MyCustomClass.class,
> NullWritable.class,
> sc.hadoopConfiguration());
> Tuple2<MyCustomClass, NullWritable> first = records.first();
> {code}
> This seems to work fine, but using a debugger I can see that while the RDD
> has a kClassTag of my.package.containing.MyCustomClass, the variable first
> contains a Tuple2<AvroKey, NullWritable>, not Tuple2<MyCustomClass,
> NullWritable>! And indeed, when the following line executes:
> {code}
> System.out.println("Got a result, custom field is: " +
> first._1.getSomeCustomField());
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast
> to my.package.containing.MyCustomClass
> {code}
> Am I doing something wrong? And even so, shouldn't I get a compilation error
> rather than a runtime error?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]