aznwarmonkey opened a new issue #4803: URL: https://github.com/apache/hudi/issues/4803
Hello, I am trying to run clustering and the job is erroring out without much indication as to why. Here's the command I am using to run clustering: ```sh spark-submit \ --class org.apache.hudi.utilities.HoodieClusteringJob \ /usr/lib/hudi/hudi-utilities-bundle.jar \ --props s3://path-to-test/clustering.properties \ --mode scheduleAndExecute \ --base-path s3://path-to-test/data/hudi/test/country/ \ --table-name country --spark-memory 1g ``` Here's the properties file: ``` hoodie.clustering.async.enabled=true hoodie.clustering.async.max.commits=1 hoodie.clustering.plan.strategy.target.file.max.bytes=1073741824 hoodie.clustering.plan.strategy.small.file.limit=629145600 hoodie.clustering.execution.strategy.class=org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy hoodie.clustering.plan.strategy.sort.columns=enrich_selector_id ``` And here's the console output of clustering job. ```shell 22/02/13 20:00:48 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 1541, ip-172-31-74-236.ec2.internal, executor 3, partition 0, PROCESS_LOCAL, 7927 bytes) 22/02/13 20:00:48 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 1542, ip-172-31-69-5.ec2.internal, executor 2, partition 1, PROCESS_LOCAL, 7922 bytes) 22/02/13 20:00:48 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-74-236.ec2.internal:46827 (size: 102.2 KB, free: 4.8 GB) 22/02/13 20:00:48 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-69-5.ec2.internal:43517 (size: 102.2 KB, free: 366.1 MB) 22/02/13 20:00:49 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/hoodie.properties' for reading 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213084348.replacecommit' for reading 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213103928.replacecommit' for reading 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213122909.replacecommit' for reading 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213142348.replacecommit' for reading 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213162102.replacecommit' for reading 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213181256.replacecommit.requested' for reading 22/02/13 20:00:50 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 1541) in 1859 ms on ip-172-31-74-236.ec2.internal (executor 3) (1/2) 22/02/13 20:00:50 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 1542) in 1865 ms on ip-172-31-69-5.ec2.internal (executor 2) (2/2) 22/02/13 20:00:50 INFO YarnScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool 22/02/13 20:00:50 INFO DAGScheduler: ResultStage 3 (collect at HoodieSparkEngineContext.java:78) finished in 1.891 s 22/02/13 20:00:50 INFO DAGScheduler: Job 3 finished: collect at HoodieSparkEngineContext.java:78, took 1.893633 s 22/02/13 20:00:50 INFO Javalin: Stopping Javalin ... 22/02/13 20:00:50 INFO Javalin: Javalin has stopped 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213084348.replacecommit' for reading 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213103928.replacecommit' for reading 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213122909.replacecommit' for reading 22/02/13 20:00:50 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213142348.replacecommit' for reading 22/02/13 20:00:51 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213162102.replacecommit' for reading 22/02/13 20:00:51 INFO S3NativeFileSystem: Opening 's3://path-to-test/data/hudi/test/country/.hoodie/20220213181256.replacecommit.requested' for reading 22/02/13 20:00:51 ERROR HoodieClusteringJob: Clustering with basePath: s3://path-to-test/data/hudi/test/country/, tableName: country, runningMode: scheduleAndExecute failed 22/02/13 20:00:51 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-66-151.ec2.internal:4041 22/02/13 20:00:51 INFO YarnClientSchedulerBackend: Interrupting monitor thread 22/02/13 20:00:51 INFO YarnClientSchedulerBackend: Shutting down all executors 22/02/13 20:00:51 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 22/02/13 20:00:51 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 22/02/13 20:00:51 INFO YarnClientSchedulerBackend: Stopped 22/02/13 20:00:51 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 22/02/13 20:00:51 INFO MemoryStore: MemoryStore cleared 22/02/13 20:00:51 INFO BlockManager: BlockManager stopped 22/02/13 20:00:51 INFO BlockManagerMaster: BlockManagerMaster stopped 22/02/13 20:00:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 22/02/13 20:00:51 INFO SparkContext: Successfully stopped SparkContext 22/02/13 20:00:51 INFO ShutdownHookManager: Shutdown hook called 22/02/13 20:00:51 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-3dd6c522-17c3-48a5-a809-8a0ad56c6da7 22/02/13 20:00:51 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-f841de1d-4101-49bc-8178-7bc79ede16b3 ``` Do any of you guys have any insight as to why this error is happening? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
