[ 
https://issues.apache.org/jira/browse/SPARK-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896194#comment-15896194
 ] 

Eric Maynard edited comment on SPARK-19656 at 3/5/17 11:51 AM:
---------------------------------------------------------------

Here is a complete working example in Java:

{code:title=AvroTest.java|borderStyle=solid}
public class AvroTest {

    public static void main(String[] args){

        //build spark session:
        System.setProperty("hadoop.home.dir", "C:\\Hadoop");//windows hack
        SparkSession spark = 
SparkSession.builder().master("local").appName("Avro Test")
                .config("spark.sql.warehouse.dir", 
"file:///c:/tmp/spark-warehouse")//another windows hack
                .getOrCreate();

        //create data:
        ArrayList list = new ArrayList<CustomClass>();
        CustomClass cc = new CustomClass();
        cc.setA(5);
        cc.setB(6);
        list.add(cc);
        spark.createDataFrame(list, 
CustomClass.class).write().mode(SaveMode.Overwrite).format("com.databricks.spark.avro").save("C:\\tmp\\file.avro");

        //read data:
        Row row = 
(spark.read().format("com.databricks.spark.avro").load("C:\\tmp\\file.avro").head());
        System.out.println(row);
        System.out.println(row.get(0));
        System.out.println(row.get(1));
        System.out.println("Success =\t" + ((Integer)row.get(0) == 5));
    }
}
{code}

With a simple custom class:
{code:title=CustomClass.java|borderStyle=solid}
import java.io.Serializable;

public class CustomClass implements Serializable {
    private int a;
    public void setA(int value){this.a = value;}
    public int getA(){return this.a;}

    private int b;
    public void setB(int value) {this.b = value;}
    public int getB(){return this.b;}
}
{code}  
  
Everything looks ok to me, and after running stdout looks like this:
{code}
[5,6]
5
6
Success =       true
{code}
  
In the future please make sure that you don't have an issue in your application 
before opening a JIRA. Also, as an aside, I really recommend picking up some 
Scala as IMO the Scala API is much friendlier, esp. around the edges for things 
like the avro library.


was (Author: emaynard):
Here is a complete working example in Java:

{code:title=AvroTest.java|borderStyle=solid}
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import java.util.ArrayList;

public class AvroTest {

    public static void main(String[] args){

        //build spark session:
        System.setProperty("hadoop.home.dir", "C:\\Hadoop");//windows hack
        SparkSession spark = 
SparkSession.builder().master("local").appName("Avro Test")
                .config("spark.sql.warehouse.dir", 
"file:///c:/tmp/spark-warehouse")//another windows hack
                .getOrCreate();

        //create data:
        ArrayList list = new ArrayList<CustomClass>();
        CustomClass cc = new CustomClass();
        cc.setValue(5);
        list.add(cc);
        spark.createDataFrame(list, 
CustomClass.class).write().format("com.databricks.spark.avro").save("C:\\tmp\\file.avro");

        //read data:
        Row row = 
(spark.read().format("com.databricks.spark.avro").load("C:\\tmp\\file.avro").head());
        System.out.println("Success =\t" + ((Integer)row.get(0) == 5));
    }
}



{code}

With a simple custom class:
{code:title=CustomClass.java|borderStyle=solid}
import java.io.Serializable;

public class CustomClass implements Serializable {
    public int value;
    public void setValue(int value){this.value = value;}
    public int getValue(){return this.value;}
}
{code}  
  
Everything looks ok to me, and the main function prints "Success = true". In 
the future please make sure that you don't have an issue in your application 
before opening a JIRA. Also, as an aside, I really recommend picking up some 
Scala as IMO the Scala API is much friendlier, esp. around the edges for things 
like the avro library.

> Can't load custom type from avro file to RDD with newAPIHadoopFile
> ------------------------------------------------------------------
>
>                 Key: SPARK-19656
>                 URL: https://issues.apache.org/jira/browse/SPARK-19656
>             Project: Spark
>          Issue Type: Question
>          Components: Java API
>    Affects Versions: 2.0.2
>            Reporter: Nira Amit
>
> If I understand correctly, in scala it's possible to load custom objects from 
> avro files to RDDs this way:
> {code}
> ctx.hadoopFile("/path/to/the/avro/file.avro",
>   classOf[AvroInputFormat[MyClassInAvroFile]],
>   classOf[AvroWrapper[MyClassInAvroFile]],
>   classOf[NullWritable])
> {code}
> I'm not a scala developer, so I tried to "translate" this to java as best I 
> could. I created classes that extend AvroKey and FileInputFormat:
> {code}
> public static class MyCustomAvroKey extends AvroKey<MyCustomClass>{};
> public static class MyCustomAvroReader extends 
> AvroRecordReaderBase<MyCustomAvroKey, NullWritable, MyCustomClass> {
> // with my custom schema and all the required methods...
>     }
> public static class MyCustomInputFormat extends 
> FileInputFormat<MyCustomAvroKey, NullWritable>{
>         @Override
>         public RecordReader<MyCustomAvroKey, NullWritable> 
> createRecordReader(InputSplit inputSplit, TaskAttemptContext 
> taskAttemptContext) throws IOException, InterruptedException {
>             return new MyCustomAvroReader();
>         }
>     }
> ...
> JavaPairRDD<MyCustomAvroKey, NullWritable> records =
>                 sc.newAPIHadoopFile("file:/path/to/datafile.avro",
>                         MyCustomInputFormat.class, MyCustomAvroKey.class,
>                         NullWritable.class,
>                         sc.hadoopConfiguration());
> MyCustomClass first = records.first()._1.datum();
> System.out.println("Got a result, some custom field: " + 
> first.getSomeCustomField());
> {code}
> This compiles fine, but using a debugger I can see that `first._1.datum()` 
> actually returns a `GenericData$Record` in runtime, not a `MyCustomClass` 
> instance.
> And indeed, when the following line executes:
> {code}
> MyCustomClass first = records.first()._1.datum();
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record 
> cannot be cast to my.package.containing.MyCustomClass
> {code}
> Am I doing it wrong? Or is this not possible in Java?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to