[GitHub] [hudi] prashanthpdesai edited a comment on issue #1695: [SUPPORT] : Global Bloom Index config issue

GitBox Wed, 03 Jun 2020 10:31:08 -0700


prashanthpdesai edited a comment on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638335917



   @nsivabalan : thank you i was able to write it successfully with global 
index after pointing to newer version of jar and but i see below exception 
while reading the parquet file . 
   could you please check is that something you can help on this.
   
   Not sure why its trying to read .commit file which is causing magic byte 
exception. 
   
   spark.read.parquet(basepath+"/*").show(false)
   
   **Caused by: org.apache.spark.SparkException: Exception thrown in 
awaitResult:**
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
     at org.apache.spark.scheduler.Task.run(Task.scala:123)
     at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.org$apache$spark$executor$Executor$TaskRunner$$anonfun$$res$1(Executor.scala:412)
     at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:419)
     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1359)
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:430)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     at java.lang.Thread.run(Thread.java:748)
   **Caused by: java.io.IOException: Could not read footer for file: 
FileStatus{path=maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit;**
 isDirectory=false; length=4366; replication=0; blocksize=0; 
modification_time=0; access_time=0; owner=; group=; permission=rw-rw-rw-; 
isSymlink=false}
     at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:551)
     at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:538)
     at 
org.apache.spark.util.ThreadUtils$$anonfun$3$$anonfun$apply$1.apply(ThreadUtils.scala:287)
     at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
     at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
     at 
scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
     at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
     at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
   **Caused by: java.lang.RuntimeException: 
maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit is not a 
Parquet file. expected magic number at tail [80, 65, 82, 49] but found [32, 48, 
10, 125]**
   
   
   
   info:
   /basepath/.hoodie/
   drwxr-sr-x. 2 xxx xgc    0 Jun  3 11:55 archived
   -rwxr-xr-x. 1 xxx xgc  207 Jun  3 11:55 hoodie.properties
   -rwxr-xr-x. 1 xxx xgc    0 Jun  3 11:55 20200603115556.commit.requested
   -rwxr-xr-x. 1 xxx xgc  380 Jun  3 11:56 20200603115556.inflight
   -rwxr-xr-x. 1 xxx xgc 4366 Jun  3 11:56 20200603115556.commit
   -rwxr-xr-x. 1 xxx xgc    0 Jun  3 11:57 20200603115719.commit.requested
   -rwxr-xr-x. 1 xxx xgc  380 Jun  3 11:57 20200603115719.inflight
   -rwxr-xr-x. 1 xxx xgc 5906 Jun  3 11:57 20200603115719.commit
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] prashanthpdesai edited a comment on issue #1695: [SUPPORT] : Global Bloom Index config issue

Reply via email to