[GitHub] [hudi] rpatid10 opened a new issue, #9076: Issue with reading data from Hudi table incrementally in Spark-shell

via GitHub Wed, 28 Jun 2023 04:57:41 -0700


rpatid10 opened a new issue, #9076:
URL: https://github.com/apache/hudi/issues/9076


   I am encountering an error while attempting to read data from a Hudi table 
incrementally using Spark-shell. Below is the code I am using:
   
       import org.apache.hudi.DataSourceReadOptions._
       import org.apache.hudi.HoodieDataSourceHelpers
       import org.apache.hadoop.fs.{FileSystem, Path}
   
       val conf = spark.sparkContext.hadoopConfiguration
       val fs = FileSystem.get(conf)
       val beginTime = "20230614155000"
       val endTime = "20230615103000"
       val srcPath = "/user/hdfs/test/testT/"
   
       val incViewDF = spark.read.format("org.apache.hudi")
       .option("hoodie.datasource.query.type", "incremental")
       .option("hoodie.datasource.read.begin.instanttime", beginTime)
       .option("hoodie.datasource.read.end.instanttime", endTime)
       .load(srcPath)
   
   However, I am encountering the following error message:
   
       java.lang.NoSuchFieldError: NULL_VALUE
       at 
org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:246)
       at 
org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:231)
       at 
org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:217)
       at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:145)
       at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaWithoutMetadataFields(TableSchemaResolver.java:180)
       at 
org.apache.hudi.IncrementalRelation.&lt;init&gt;(IncrementalRelation.scala:89)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:95)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51)
       at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309)
       at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
       at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
       ... 50 elided
   
   
   Spark-shell Command:
   
       spark-shell --jars 
hudi-spark-bundle_2.11-0.6.0.jar,parquet-avro-1.10.0.jar,avro-1.10.2.jar \
       --conf spark.sql.hive.convertMetastoreParquet=false \
       --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
   
   Environment Details:
   
       Spark version: 2.2.0
       Scala version: 2.11.0
       Hive version: 1.2.1000.2.6.3.0-235
   
   Additional Information:
   
   I have already included the necessary JAR files 
(hudi-spark-bundle_2.11-0.6.0.jar, parquet-avro-1.10.0.jar, avro-1.10.2.jar) 
while launching Spark-shell. I would appreciate any assistance in resolving 
this issue.
   
   [Slack 
Message](https://apache-hudi.slack.com/archives/C4M27T1D5/p1686851524522419?thread_ts=1686851524.522419&cid=C4M27T1D5)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] rpatid10 opened a new issue, #9076: Issue with reading data from Hudi table incrementally in Spark-shell

Reply via email to