rahulpoptani opened a new issue #2180:
URL: https://github.com/apache/hudi/issues/2180
**Describe the problem you faced**
Hi Team,
I've written a MERGE ON READ type table and made few upserts and deletes as
mentioned in the documentation.
When trying to read the hudi format output using Spark Datasource, I'm able
to read the older version of data using
"hoodie.datasource.query.type=read_optimized" option. But when trying to read
the same dataset using "hoodie.datasource.query.type=snapshot", I'm getting
below attached error.
Here is the statement what I'm executing.
val hudiDF = spark.read.format("org.apache.hudi")
.option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY,DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
.load("abfss://[email protected]/path/hudi_output_folder/*/")
Here is my hoodie.properties file from .hoodie folder
hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
hoodie.table.name=hudi_mor
hoodie.archivelog.folder=archived
hoodie.table.type=MERGE_ON_READ
hoodie.table.version=1
hoodie.timeline.layout.version=1
**To Reproduce**
Steps to reproduce the behavior:
1. Write hudi format dataset in ADLS Gen2
2. Perform Upserts/Deletes and write into same location from step 1.
3. Read the data using Spark Datasource API
**Expected behavior**
Ideally I should be able to read back the dataset with all the changes
performed using snapshot option.
**Environment Description**
* Hudi version : 0.6.0
* Spark version : 2.4.5
* Hive version : N/A
* Hadoop version : Databricks 6.4 (Apache Spark 2.4.5, Scala 2.11)
* Storage (HDFS/S3/GCS..) : ABFS (ADLS Gen2)
* Running on Docker? (yes/no) : No
**Stacktrace**
ERROR Uncaught throwable from user code: java.lang.ArrayStoreException:
org.apache.spark.sql.execution.datasources.SerializableFileStatus
at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
at
scala.collection.IterableLike$class.copyToArray(IterableLike.scala:254)
at scala.collection.AbstractIterable.copyToArray(Iterable.scala:54)
at
scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:278)
at
scala.collection.AbstractTraversable.copyToArray(Traversable.scala:104)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:286)
at scala.collection.AbstractTraversable.toArray(Traversable.scala:104)
at
org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:138)
at
org.apache.hudi.MergeOnReadSnapshotRelation.<init>(MergeOnReadSnapshotRelation.scala:72)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:87)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:351)
at
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:311)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:214)
at some.internal.app.path.HudiReader.read(HudiReader.java:29)
at
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command--1:1)
at
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw$$iw$$iw.<init>(command--1:44)
at
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw$$iw.<init>(command--1:46)
at
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw.<init>(command--1:48)
at
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw.<init>(command--1:50)
at
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw.<init>(command--1:52)
at linea9ed2ae11338492abc70c94b9b10ce2025.$read.<init>(command--1:54)
at linea9ed2ae11338492abc70c94b9b10ce2025.$read$.<init>(command--1:58)
at linea9ed2ae11338492abc70c94b9b10ce2025.$read$.<clinit>(command--1)
at
linea9ed2ae11338492abc70c94b9b10ce2025.$eval$.$print$lzycompute(<notebook>:7)
at linea9ed2ae11338492abc70c94b9b10ce2025.$eval$.$print(<notebook>:6)
at linea9ed2ae11338492abc70c94b9b10ce2025.$eval.$print(<notebook>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)
at
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054)
at
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
at
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
at
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at
scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572)
at
com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
at
com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:202)
at
com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
at
com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
at
com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
at
com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
at
com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:396)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:373)
at
com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at
com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
at
com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
at
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
at scala.util.Try$.apply(Try.scala:192)
at
com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:639)
at
com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:485)
at
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:597)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:390)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]