[jira] [Resolved] (SPARK-19424) Wrong runtime type in RDD when reading from avro with custom serializer

Sean Owen (JIRA) Tue, 14 Mar 2017 09:05:29 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-19424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-19424.
-------------------------------
    Resolution: Not A Problem

You resolved the problem above though, yourself. You hadn't passed the correct 
arguments to this function. I don't see anyone suggesting you suppress a 
ClassCastException, and correct usage does not produce one. I am actually not 
clear what you are demanding: even if we could break the API to change it, what 
would it look like? As I say, you can at best blame the Hadoop API for the 
structure this is mirroring, but I gave a more nuanced explanation in the 
email. It involves Scala, which I assume is new to you. That is worth 
re-reading.

Yes, you have some business need, but this doesn't entitle you to a specific 
outcome or anyone's time. You've received more than a fair amount of help here. 
On the contrary, it's not OK to continually reopen this issue with no change, 
or proposal to make. It's a quick way to ensure you get no help in the future.

This part of the discussion is clearly done, and I will close it one more time. 
Please leave it closed. Your next step is to back up and decide what code 
change you would propose at this point, and reply to your earlier thread on the 
mailing list. If there's any support for it then we can make a new JIRA. 

> Wrong runtime type in RDD when reading from avro with custom serializer
> -----------------------------------------------------------------------
>
>                 Key: SPARK-19424
>                 URL: https://issues.apache.org/jira/browse/SPARK-19424
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.0.2
>         Environment: Ubuntu, spark 2.0.2 prebuilt for hadoop 2.7
>            Reporter: Nira Amit
>
> I am trying to read data from avro files into an RDD using Kryo. My code 
> compiles fine, but in runtime I'm getting a ClassCastException. Here is what 
> my code does:
> {code}
> SparkConf conf = new SparkConf()...
> conf.set("spark.serializer", KryoSerializer.class.getCanonicalName());
> conf.set("spark.kryo.registrator", MyKryoRegistrator.class.getName());
> JavaSparkContext sc = new JavaSparkContext(conf);
> {code}
> Where MyKryoRegistrator registers a Serializer for MyCustomClass:
> {code}
> public void registerClasses(Kryo kryo) {
>     kryo.register(MyCustomClass.class, new MyCustomClassSerializer());
> }
> {code}
> Then, I read my datafile:
> {code}
> JavaPairRDD<MyCustomClass, NullWritable> records =
>                 sc.newAPIHadoopFile("file:/path/to/datafile.avro",
>                 AvroKeyInputFormat.class, MyCustomClass.class, 
> NullWritable.class,
>                 sc.hadoopConfiguration());
> Tuple2<MyCustomClass, NullWritable> first = records.first();
> {code}
> This seems to work fine, but using a debugger I can see that while the RDD 
> has a kClassTag of my.package.containing.MyCustomClass, the variable first 
> contains a Tuple2<AvroKey, NullWritable>, not Tuple2<MyCustomClass, 
> NullWritable>! And indeed, when the following line executes:
> {code}
> System.out.println("Got a result, custom field is: " + 
> first._1.getSomeCustomField());
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast 
> to my.package.containing.MyCustomClass
> {code}
> Am I doing something wrong? And even so, shouldn't I get a compilation error 
> rather than a runtime error?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-19424) Wrong runtime type in RDD when reading from avro with custom serializer

Reply via email to