Ella created NIFI-4236:
--------------------------

             Summary: Exception in the program whose purpose is to 
deserialize/read content of an AVRO file with Spark SQL
                 Key: NIFI-4236
                 URL: https://issues.apache.org/jira/browse/NIFI-4236
             Project: Apache NiFi
          Issue Type: Bug
    Affects Versions: 1.3.0
         Environment: Windows 10- IntelliJIDEA
            Reporter: Ella


Hi Guys,

I have written a Java program so as to read content of an AVRO file by Spark 
Sql in IntelliJ; however, I get the following Exception--I have checked the 
correctness of type of file as well as its path for many times. I really got 
confused regarding this issue.
I would greatly appreciate if anyone guided me accordingly.

Thanks a lot in advance.
Regards,
Ella

//here is the Java code



import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.SparkContext$;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.catalyst.encoders.RowEncoder$;


/**
 * Created by asus on 7/24/2017.
 */
public class AvroTestFinal {

    public static void main(String[] args) {

        System.setProperty("hadoop.home.dir","c:\\winutil\\");
       SparkConf sparkConf = new 
SparkConf().setMaster("local").setAppName("TEST");
        JavaSparkContext sparkw = new JavaSparkContext(sparkConf);
        
sparkw.hadoopConfiguration().setBoolean("avro.mapred.ignore.inputs.without.extension",false);

        SQLContext sqlContext = new SQLContext(sparkw);
        
sqlContext.sparkContext().hadoopConfiguration().setBoolean("avro.mapred.ignore.inputs.without.extension",
 false);

        SparkContext spark = JavaSparkContext.toSparkContext(sparkw);
        // Creates a DataFrame from a specified file
        SparkSession sparkSession = new SparkSession(spark);
     
        
sqlContext.read().format("com.databricks.spark.avro").load("C:/Users/asus/IdeaProjects/comavro/avro/509048290108207.avro");

Dataset<Row> tableRowsDataFrame = sparkSession.sql("select count(*) as total 
from avroTable");
        tableRowsDataFrame.show();
    }
}

//Here is the aforementioned Exception

tion in thread "main" java.io.IOException: Not an Avro data file
        at 
org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:63)
        at 
com.databricks.spark.avro.DefaultSource$$anonfun$5.apply(DefaultSource.scala:80)
        at 
com.databricks.spark.avro.DefaultSource$$anonfun$5.apply(DefaultSource.scala:77)
        at scala.Option.getOrElse(Option.scala:121)
        at 
com.databricks.spark.avro.DefaultSource.inferSchema(DefaultSource.scala:77)
        at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
        at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
        at scala.Option.orElse(Option.scala:289)
        at 
org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
        at AvroTestFinal.main(AvroTestFinal.java:40)
17/07/28 21:10:31 INFO SparkContext: Invoking stop() from shutdown hook
17/07/28 21:10:31 INFO SparkUI: Stopped Spark web UI at 
http://192.168.209.1:4040
17/07/28 21:10:31 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
17/07/28 21:10:31 INFO MemoryStore: MemoryStore cleared
17/07/28 21:10:31 INFO BlockManager: BlockManager stopped
17/07/28 21:10:31 INFO BlockManagerMaster: BlockManagerMaster stopped
17/07/28 21:10:31 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
17/07/28 21:10:31 INFO SparkContext: Successfully stopped SparkContext
17/07/28 21:10:31 INFO ShutdownHookManager: Shutdown hook called
17/07/28 21:10:31 INFO ShutdownHookManager: Deleting directory 
C:\Users\asus\AppData\Local\Temp\spark-83a8dd26-f9a7-4f76-8806-12e3a9dc94f2

Process finished with exit code 1




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to