rahulpoptani opened a new issue #2180:
URL: https://github.com/apache/hudi/issues/2180


   **Describe the problem you faced**
   Hi Team,
   I've written a MERGE ON READ type table and made few upserts and deletes as 
mentioned in the documentation.
   When trying to read the hudi format output using Spark Datasource, I'm able 
to read the older version of data using 
"hoodie.datasource.query.type=read_optimized" option. But when trying to read 
the same dataset using "hoodie.datasource.query.type=snapshot", I'm getting 
below attached error.
   
   Here is the statement what I'm executing.
   val hudiDF = spark.read.format("org.apache.hudi")
        
.option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY,DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
        
.load("abfss://[email protected]/path/hudi_output_folder/*/")
   
   Here is my hoodie.properties file from .hoodie folder
   
hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
   hoodie.table.name=hudi_mor
   hoodie.archivelog.folder=archived
   hoodie.table.type=MERGE_ON_READ
   hoodie.table.version=1
   hoodie.timeline.layout.version=1
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Write hudi format dataset in ADLS Gen2
   2. Perform Upserts/Deletes and write into same location from step 1.
   3. Read the data using Spark Datasource API
   
   **Expected behavior**
   
   Ideally I should be able to read back the dataset with all the changes 
performed using snapshot option.
   
   **Environment Description**
   
   * Hudi version : 0.6.0
   
   * Spark version : 2.4.5
   
   * Hive version : N/A
   
   * Hadoop version : Databricks 6.4 (Apache Spark 2.4.5, Scala 2.11)
   
   * Storage (HDFS/S3/GCS..) : ABFS (ADLS Gen2)
   
   * Running on Docker? (yes/no) : No
   
   **Stacktrace**
   
   ERROR Uncaught throwable from user code: java.lang.ArrayStoreException: 
org.apache.spark.sql.execution.datasources.SerializableFileStatus
        at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
        at 
scala.collection.IterableLike$class.copyToArray(IterableLike.scala:254)
        at scala.collection.AbstractIterable.copyToArray(Iterable.scala:54)
        at 
scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:278)
        at 
scala.collection.AbstractTraversable.copyToArray(Traversable.scala:104)
        at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:286)
        at scala.collection.AbstractTraversable.toArray(Traversable.scala:104)
        at 
org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:138)
        at 
org.apache.hudi.MergeOnReadSnapshotRelation.<init>(MergeOnReadSnapshotRelation.scala:72)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:87)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:351)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:311)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:297)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:214)
        at some.internal.app.path.HudiReader.read(HudiReader.java:29)
        at 
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command--1:1)
        at 
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw$$iw$$iw.<init>(command--1:44)
        at 
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw$$iw.<init>(command--1:46)
        at 
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw.<init>(command--1:48)
        at 
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw.<init>(command--1:50)
        at 
linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw.<init>(command--1:52)
        at linea9ed2ae11338492abc70c94b9b10ce2025.$read.<init>(command--1:54)
        at linea9ed2ae11338492abc70c94b9b10ce2025.$read$.<init>(command--1:58)
        at linea9ed2ae11338492abc70c94b9b10ce2025.$read$.<clinit>(command--1)
        at 
linea9ed2ae11338492abc70c94b9b10ce2025.$eval$.$print$lzycompute(<notebook>:7)
        at linea9ed2ae11338492abc70c94b9b10ce2025.$eval$.$print(<notebook>:6)
        at linea9ed2ae11338492abc70c94b9b10ce2025.$eval.$print(<notebook>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)
        at 
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054)
        at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
        at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
        at 
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
        at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
        at 
scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572)
        at 
com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
        at 
com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:202)
        at 
com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
        at 
com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
        at 
com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
        at 
com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
        at 
com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
        at 
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:396)
        at 
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:373)
        at 
com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at 
com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)
        at 
com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
        at 
com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)
        at 
com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
        at 
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
        at 
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
        at 
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
        at scala.util.Try$.apply(Try.scala:192)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:639)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:485)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:597)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:390)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
        at 
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
        at java.lang.Thread.run(Thread.java:748)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to