[GitHub] [hudi] codejoyan opened a new issue #3499: [SUPPORT] Inline Clustering fails with Hudi

GitBox Wed, 18 Aug 2021 23:21:00 -0700


codejoyan opened a new issue #3499:
URL: https://github.com/apache/hudi/issues/3499



   **Environment**
   Hudi Version - 0.7
   Spark - 2.4.7
   DFS - Google Cloud storage
   
   **Inline Clustering Enabled**
   For lower ingestion latency without compromising the query performance I 
have now deployed a code with inline clustering enabled for further incremental 
runs. When I run the code with clustering it errors out with the below error 
   
   I have no idea why the number of partitions is -10. Please help me to debug.
   **Stacktrace of the error**
   ```
   21/08/19 06:08:34 INFO timeline.HoodieActiveTimeline: Loaded instants 
[[20210818063404__commit__COMPLETED], [20210818064709__commit__COMPLETED], 
[20210818071622__commit__COMPLETED], [20210818072722__commit__COMPLETED], 
[20210818073610__commit__COMPLETED], [20210818074601__commit__COMPLETED], 
[20210818080912__commit__COMPLETED], [20210818083622__commit__COMPLETED], 
[20210819054628__rollback__COMPLETED], [20210819060506__commit__COMPLETED], 
[==>20210819060829__replacecommit__INFLIGHT]]
   **21/08/19 06:08:34 ERROR util.GlobalVar$: 
java.lang.IllegalArgumentException: requirement failed: Number of partitions 
cannot be negative but found -10**.
   21/08/19 06:08:34 ERROR yarn.ApplicationMaster: User class threw exception: 
java.lang.Exception: Query execution failed in transformAndLoadBaseTable
   java.lang.Exception: Query execution failed in transformAndLoadBaseTable
        at 
com.walmart.finwb.salesbaseload.SalesLoadBaseTables$.transformAndLoadBaseTable(SalesLoadBaseTables.scala:323)
        at 
com.walmart.finwb.salesbaseload.SalesLoadBaseTables$.main(SalesLoadBaseTables.scala:116)
        at 
com.walmart.finwb.salesbaseload.SalesLoadBaseTables.main(SalesLoadBaseTables.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:686)
   Caused by: java.lang.IllegalArgumentException: requirement failed: Number of 
partitions cannot be negative but found -10.
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:155)
        at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:151)
        at 
org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
        at 
org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
        at 
org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
        at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:645)
        at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:646)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
        at org.apache.spark.rdd.RDD.sortBy(RDD.scala:643)
        at org.apache.spark.api.java.JavaRDD.sortBy(JavaRDD.scala:206)
        at 
org.apache.hudi.execution.bulkinsert.GlobalSortPartitioner.repartitionRecords(GlobalSortPartitioner.java:41)
        at 
org.apache.hudi.execution.bulkinsert.GlobalSortPartitioner.repartitionRecords(GlobalSortPartitioner.java:34)
        at 
org.apache.hudi.table.action.commit.SparkBulkInsertHelper.bulkInsert(SparkBulkInsertHelper.java:103)
        at 
org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy.performClustering(SparkSortAndSizeExecutionStrategy.java:74)
        at 
org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy.performClustering(SparkSortAndSizeExecutionStrategy.java:50)
        at 
org.apache.hudi.table.action.cluster.SparkExecuteClusteringCommitActionExecutor.lambda$runClusteringForGroupAsync$3(SparkExecuteClusteringCommitActionExecutor.java:121)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
   21/08/19 06:08:34 INFO yarn.ApplicationMaster: Final app status: FAILED, 
exitCode: 15, (reason: User class threw exception: java.lang.Exception: Query 
execution failed in transformAndLoadBaseTable
   ```
   Let me know if you need any further information
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] codejoyan opened a new issue #3499: [SUPPORT] Inline Clustering fails with Hudi

Reply via email to