prashanthpdesai edited a comment on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638335917
@nsivabalan : thank you i was able to write it successfully with global
index after pointing to newer version of jar and but i see below exception
while reading the parquet file .
could you please check is that something you can help on this.
Not sure why its trying to read .commit file which is causing magic byte
exception.
spark.read.parquet(basepath+"/*").show(false)
**Caused by: org.apache.spark.SparkException: Exception thrown in
awaitResult:**
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.org$apache$spark$executor$Executor$TaskRunner$$anonfun$$res$1(Executor.scala:412)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:419)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1359)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:430)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
**Caused by: java.io.IOException: Could not read footer for file:
FileStatus{path=maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit;**
isDirectory=false; length=4366; replication=0; blocksize=0;
modification_time=0; access_time=0; owner=; group=; permission=rw-rw-rw-;
isSymlink=false}
at
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:551)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:538)
at
org.apache.spark.util.ThreadUtils$$anonfun$3$$anonfun$apply$1.apply(ThreadUtils.scala:287)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
**Caused by: java.lang.RuntimeException:
maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit is not a
Parquet file. expected magic number at tail [80, 65, 82, 49] but found [32, 48,
10, 125]**
info:
/basepath/.hoodie/
drwxr-sr-x. 2 xxx xgc 0 Jun 3 11:55 archived
-rwxr-xr-x. 1 xxx xgc 207 Jun 3 11:55 hoodie.properties
-rwxr-xr-x. 1 xxx xgc 0 Jun 3 11:55 20200603115556.commit.requested
-rwxr-xr-x. 1 xxx xgc 380 Jun 3 11:56 20200603115556.inflight
-rwxr-xr-x. 1 xxx xgc 4366 Jun 3 11:56 20200603115556.commit
-rwxr-xr-x. 1 xxx xgc 0 Jun 3 11:57 20200603115719.commit.requested
-rwxr-xr-x. 1 xxx xgc 380 Jun 3 11:57 20200603115719.inflight
-rwxr-xr-x. 1 xxx xgc 5906 Jun 3 11:57 20200603115719.commit
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]