[ 
https://issues.apache.org/jira/browse/SPARK-19424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848981#comment-15848981
 ] 

Nira Amit commented on SPARK-19424:
-----------------------------------

[~srowen] thanks for elaborating, but in that case I just can't find a way to 
do this in java other than loading GenericData.Record and converting it to my 
class after loading. From what I've googled it appears that in scala it is 
possible to do it this way:
{code}
ctx.hadoopFile("/path/to/the/avro/file.avro",
  classOf[AvroInputFormat[MyClassInAvroFile]],
  classOf[AvroWrapper[MyClassInAvroFile]],
  classOf[NullWritable])
{code}
So I tried to "translate" this to java as best I could (hence the funky way for 
getting the Class), but nothing works. I also tried with classes that extend 
AvroKey and AvroKeyInputFormat:
{code}
public static class MyCustomInputFormat extends 
AvroKeyInputFormat<MyCustomClass>{};
public static class MyCustomAvroKey extends AvroKey<MyCustomClass>{};
...
JavaPairRDD<MyCustomAvroKey, NullWritable> records =
                sc.newAPIHadoopFile("file:/path/to/datafile.avro",
                        MyCustomInputFormat.class, MyCustomAvroKey.class,
                        NullWritable.class,
                        sc.hadoopConfiguration());
{code}
But again I'm getting a compilation error:
{code}
    inferred type does not conform to equality constraint(s)
    inferred: 
org.apache.avro.mapred.AvroKey<my.package.containing.MyCustomClass>
    equality constraints(s): 
org.apache.avro.mapred.AvroKey<my.package.containing.MyCustomClass>,my.package.containing.MyAvroTest.MyCustomAvroKey
{code}
Is this just not possible in java?

> Wrong runtime type in RDD when reading from avro with custom serializer
> -----------------------------------------------------------------------
>
>                 Key: SPARK-19424
>                 URL: https://issues.apache.org/jira/browse/SPARK-19424
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.0.2
>         Environment: Ubuntu, spark 2.0.2 prebuilt for hadoop 2.7
>            Reporter: Nira Amit
>
> I am trying to read data from avro files into an RDD using Kryo. My code 
> compiles fine, but in runtime I'm getting a ClassCastException. Here is what 
> my code does:
> {code}
> SparkConf conf = new SparkConf()...
> conf.set("spark.serializer", KryoSerializer.class.getCanonicalName());
> conf.set("spark.kryo.registrator", MyKryoRegistrator.class.getName());
> JavaSparkContext sc = new JavaSparkContext(conf);
> {code}
> Where MyKryoRegistrator registers a Serializer for MyCustomClass:
> {code}
> public void registerClasses(Kryo kryo) {
>     kryo.register(MyCustomClass.class, new MyCustomClassSerializer());
> }
> {code}
> Then, I read my datafile:
> {code}
> JavaPairRDD<MyCustomClass, NullWritable> records =
>                 sc.newAPIHadoopFile("file:/path/to/datafile.avro",
>                 AvroKeyInputFormat.class, MyCustomClass.class, 
> NullWritable.class,
>                 sc.hadoopConfiguration());
> Tuple2<MyCustomClass, NullWritable> first = records.first();
> {code}
> This seems to work fine, but using a debugger I can see that while the RDD 
> has a kClassTag of my.package.containing.MyCustomClass, the variable first 
> contains a Tuple2<AvroKey, NullWritable>, not Tuple2<MyCustomClass, 
> NullWritable>! And indeed, when the following line executes:
> {code}
> System.out.println("Got a result, custom field is: " + 
> first._1.getSomeCustomField());
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast 
> to my.package.containing.MyCustomClass
> {code}
> Am I doing something wrong? And even so, shouldn't I get a compilation error 
> rather than a runtime error?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to