[jira] [Commented] (SPARK-19424) Wrong runtime type in RDD when reading from avro with custom serializer

Nira Amit (JIRA) Wed, 01 Feb 2017 09:06:20 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848634#comment-15848634
 ]


Nira Amit commented on SPARK-19424:
-----------------------------------

[~srowen] So shouldn't this give me a compilation error? I'm confused about how 
to  load custom data-types from avro, there are many examples in scala but not 
in java. I just tried the following:
{code}
JavaPairRDD<AvroKey<MyCustomClass>, NullWritable> records =
                sc.newAPIHadoopFile("file:/path/to/datafile.avro",
                new AvroKeyInputFormat<MyCustomClass>().getClass(), new 
AvroKey<MyCustomClass>().getClass(), NullWritable.class,
                sc.hadoopConfiguration());
{code}
And this DOES give a compilation error:
{code}
Error:(253, 36) java: incompatible types: inferred type does not conform to 
equality constraint(s)
    inferred: 
org.apache.avro.mapred.AvroKey<my.package.containing.MyCustomClass>
    equality constraints(s): 
org.apache.avro.mapred.AvroKey<my.package.containing.MyCustomClass>,capture#1 
of ? extends org.apache.avro.mapred.AvroKey
{code}


> Wrong runtime type in RDD when reading from avro with custom serializer
> -----------------------------------------------------------------------
>
>                 Key: SPARK-19424
>                 URL: https://issues.apache.org/jira/browse/SPARK-19424
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.0.2
>         Environment: Ubuntu, spark 2.0.2 prebuilt for hadoop 2.7
>            Reporter: Nira Amit
>
> I am trying to read data from avro files into an RDD using Kryo. My code 
> compiles fine, but in runtime I'm getting a ClassCastException. Here is what 
> my code does:
> {code}
> SparkConf conf = new SparkConf()...
> conf.set("spark.serializer", KryoSerializer.class.getCanonicalName());
> conf.set("spark.kryo.registrator", MyKryoRegistrator.class.getName());
> JavaSparkContext sc = new JavaSparkContext(conf);
> {code}
> Where MyKryoRegistrator registers a Serializer for MyCustomClass:
> {code}
> public void registerClasses(Kryo kryo) {
>     kryo.register(MyCustomClass.class, new MyCustomClassSerializer());
> }
> {code}
> Then, I read my datafile:
> {code}
> JavaPairRDD<MyCustomClass, NullWritable> records =
>                 sc.newAPIHadoopFile("file:/path/to/datafile.avro",
>                 AvroKeyInputFormat.class, MyCustomClass.class, 
> NullWritable.class,
>                 sc.hadoopConfiguration());
> Tuple2<MyCustomClass, NullWritable> first = records.first();
> {code}
> This seems to work fine, but using a debugger I can see that while the RDD 
> has a kClassTag of my.package.containing.MyCustomClass, the variable first 
> contains a Tuple2<AvroKey, NullWritable>, not Tuple2<MyCustomClass, 
> NullWritable>! And indeed, when the following line executes:
> {code}
> System.out.println("Got a result, custom field is: " + 
> first._1.getSomeCustomField());
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast 
> to my.package.containing.MyCustomClass
> {code}
> Am I doing something wrong? And even so, shouldn't I get a compilation error 
> rather than a runtime error?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-19424) Wrong runtime type in RDD when reading from avro with custom serializer

Reply via email to