alexone95 commented on issue #8436: URL: https://github.com/apache/hudi/issues/8436#issuecomment-1510840666
I succeded in calling the cleaner directly in EMR, but the .commit files in the /.hoodie directory are still there. Am i missing something in how the cleaner is working? i expect to find only the last 10 commits. Adding the stacktrace that i get from calling the cleaner: 23/04/13 15:48:33 INFO SparkContext: Running Spark version 3.3.0-amzn-1 23/04/13 15:48:33 INFO ResourceUtils: ============================================================== 23/04/13 15:48:33 INFO ResourceUtils: No custom resources configured for spark.driver. 23/04/13 15:48:33 INFO ResourceUtils: ============================================================== 23/04/13 15:48:33 INFO SparkContext: Submitted application: hoodie-cleaner-hudiTable 23/04/13 15:48:33 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory-> name: memory, amount: 9108, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount:1.0) 23/04/13 15:48:33 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 23/04/13 15:48:33 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/04/13 15:48:33 INFO SecurityManager: Changing view acls to: root 23/04/13 15:48:33 INFO SecurityManager: Changing modify acls to: root 23/04/13 15:48:33 INFO SecurityManager: Changing view acls groups to: 23/04/13 15:48:33 INFO SecurityManager: Changing modify acls groups to: 23/04/13 15:48:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 23/04/13 15:48:33 INFO deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec 23/04/13 15:48:33 INFO deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type 23/04/13 15:48:33 INFO deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 23/04/13 15:48:33 INFO Utils: Successfully started service 'sparkDriver' on port 43157. 23/04/13 15:48:33 INFO SparkEnv: Registering MapOutputTracker 23/04/13 15:48:33 INFO SparkEnv: Registering BlockManagerMaster 23/04/13 15:48:33 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/04/13 15:48:33 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/04/13 15:48:33 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/04/13 15:48:33 INFO DiskBlockManager: Created local directory at /mnt/tmp/blockmgr-cbc3b241-36a8-45c6-aa3f-083f987dbb58 23/04/13 15:48:33 INFO MemoryStore: MemoryStore started with capacity 912.3 MiB 23/04/13 15:48:33 INFO SparkEnv: Registering OutputCommitCoordinator 23/04/13 15:48:33 INFO SubResultCacheManager: Sub-result caches are disabled. 23/04/13 15:48:34 INFO Utils: Successfully started service 'SparkUI' on port 8090. 23/04/13 15:48:34 INFO SparkContext: Added JAR file:///usr/lib/hadoop/hadoop-distcp-3.3.3-amzn-1.jar at spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hadoop-distcp-3.3.3-amzn-1.jar with timestamp 1681400913096 23/04/13 15:48:34 INFO SparkContext: Added JAR file:/usr/lib/hudi/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar at spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar with timestamp 1681400913096 23/04/13 15:48:34 INFO Executor: Starting executor ID driver on host ip-10-108-166-149.eu-central-1.compute.internal 23/04/13 15:48:34 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): 'file:/usr/lib/hadoop-lzo/lib/*,file:/usr/lib/hadoop/hadoop-aws.jar,file:/usr/share/aws/aws-java-sdk/*,file:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/usr/share/aws/emr/security/conf,file:/usr/share/aws/emr/security/lib/*,file:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar,file:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/docker/usr/lib/hadoop-lzo/lib/*,file:/docker/usr/lib/hadoop/hadoop-aws.jar,file:/docker/usr/share/aws/aws-java-sdk/*,file:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/docker/usr/share/aws/emr/security/conf,file:/docker/usr/share/aws/emr/security/lib/*,file:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/docker/usr/share/java/Hiv e-JSON-Serde/hive-openx-serde.jar,file:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/root/emr-spark-goodies.jar,file:/root/conf,file:/root/emr-s3-select-spark-connector.jar,file:/root/hadoop-aws.jar,file:/root/hive-openx-serde.jar,file:/root/sagemaker-spark-sdk.jar,file:/root/aws-glue-datacatalog-spark-client.jar,file:/root/*' 23/04/13 15:48:34 INFO Executor: Fetching spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hadoop-distcp-3.3.3-amzn-1.jar with timestamp 1681400913096 23/04/13 15:48:34 INFO TransportClientFactory: Successfully created connection to ip-10-108-166-149.eu-central-1.compute.internal/10.108.166.149:43157 after37 ms (0 ms spent in bootstraps) 23/04/13 15:48:34 INFO Utils: Fetching spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hadoop-distcp-3.3.3-amzn-1.jar to /mnt/tmp/spark-41a0e875-65a4-49e6-85dd-d9906be717b5/userFiles-af2662a0-3ce4-4692-ab2f-5c8a363dcdc2/fetchFileTemp894068807142647604.tmp 23/04/13 15:48:34 INFO Executor: Adding file:/mnt/tmp/spark-41a0e875-65a4-49e6-85dd-d9906be717b5/userFiles-af2662a0-3ce4-4692-ab2f-5c8a363dcdc2/hadoop-distcp-3.3.3-amzn-1.jar to class loader 23/04/13 15:48:34 INFO Executor: Fetching spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar with timestamp 1681400913096 23/04/13 15:48:34 INFO Utils: Fetching spark://ip-10-108-166-149.eu-central-1.compute.internal:43157/jars/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar to /mnt/tmp/spark-41a0e875-65a4-49e6-85dd-d9906be717b5/userFiles-af2662a0-3ce4-4692-ab2f-5c8a363dcdc2/fetchFileTemp7187610857861244070.tmp 23/04/13 15:48:34 INFO Executor: Adding file:/mnt/tmp/spark-41a0e875-65a4-49e6-85dd-d9906be717b5/userFiles-af2662a0-3ce4-4692-ab2f-5c8a363dcdc2/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar to class loader 23/04/13 15:48:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38925. 23/04/13 15:48:34 INFO NettyBlockTransferService: Server created on ip-10-108-166-149.eu-central-1.compute.internal:38925 23/04/13 15:48:34 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/04/13 15:48:34 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ip-10-108-166-149.eu-central-1.compute.internal, 38925, None) 23/04/13 15:48:34 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-108-166-149.eu-central-1.compute.internal:38925 with 912.3 MiB RAM, BlockManagerId(driver, ip-10-108-166-149.eu-central-1.compute.internal, 38925, None) 23/04/13 15:48:34 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ip-10-108-166-149.eu-central-1.compute.internal, 38925, None) 23/04/13 15:48:34 INFO BlockManager: external shuffle service port = 7337 23/04/13 15:48:34 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, ip-10-108-166-149.eu-central-1.compute.internal, 38925, None) 23/04/13 15:48:35 INFO SingleEventLogFileWriter: Logging events to hdfs:/var/log/spark/apps/local-1681400914190.inprogress 23/04/13 15:48:36 INFO ClientConfigurationFactory: Set initial getObject socket timeout to 2000 ms. 23/04/13 15:48:37 INFO Javalin: __ __ _ / /____ _ _ __ ____ _ / /(_)____ __ / // __ `/| | / // __ `// // // __ \ / /_/ // /_/ / | |/ // /_/ // // // / / / \____/ \__,_/ |___/ \__,_//_//_//_/ /_/ https://javalin.io/documentation 23/04/13 15:48:37 INFO Javalin: Starting Javalin ... 23/04/13 15:48:37 INFO Javalin: Listening on http://localhost:45175/ 23/04/13 15:48:37 INFO Javalin: Javalin started in 162ms \o/ 23/04/13 15:48:38 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 23/04/13 15:48:39 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 23/04/13 15:48:39 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/.hoodie/hoodie.properties' for reading 23/04/13 15:48:39 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/.hoodie/hoodie.properties' for reading 23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141447019.clean' for reading 23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141447019.clean' for reading 23/04/13 15:48:40 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 23/04/13 15:48:41 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/hoodie.properties' for reading 23/04/13 15:48:41 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/.hoodie/hoodie.properties' for reading 23/04/13 15:48:41 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141447019.clean' for reading 23/04/13 15:48:41 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141447019.clean' for reading 23/04/13 15:48:42 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:137 23/04/13 15:48:42 INFO DAGScheduler: Got job 0 (collect at HoodieSparkEngineContext.java:137) with 1 output partitions 23/04/13 15:48:42 INFO DAGScheduler: Final stage: ResultStage 0 (collect at HoodieSparkEngineContext.java:137) 23/04/13 15:48:42 INFO DAGScheduler: Parents of final stage: List() 23/04/13 15:48:42 INFO DAGScheduler: Missing parents: List() 23/04/13 15:48:42 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at flatMap at HoodieSparkEngineContext.java:137), which has no missing parents 23/04/13 15:48:42 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 112.6 KiB, free 912.2 MiB) 23/04/13 15:48:42 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 42.3 KiB, free 912.1 MiB) 23/04/13 15:48:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-108-166-149.eu-central-1.compute.internal:38925 (size: 42.3 KiB, free: 912.3 MiB) 23/04/13 15:48:42 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1570 23/04/13 15:48:42 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at flatMap at HoodieSparkEngineContext.java:137) (first 15 tasks are for partitions Vector(0)) 23/04/13 15:48:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0 23/04/13 15:48:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 4447 bytes) taskResourceAssignments Map() 23/04/13 15:48:42 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 23/04/13 15:48:43 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2928 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 456 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver)(1/1) 23/04/13 15:48:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 23/04/13 15:48:43 INFO DAGScheduler: ResultStage 0 (collect at HoodieSparkEngineContext.java:137) finished in 1.056 s 23/04/13 15:48:43 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job 23/04/13 15:48:43 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished 23/04/13 15:48:43 INFO DAGScheduler: Job 0 finished: collect at HoodieSparkEngineContext.java:137, took 1.161074 s 23/04/13 15:48:43 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:103 23/04/13 15:48:43 INFO DAGScheduler: Got job 1 (collect at HoodieSparkEngineContext.java:103) with 14 output partitions 23/04/13 15:48:43 INFO DAGScheduler: Final stage: ResultStage 1 (collect at HoodieSparkEngineContext.java:103) 23/04/13 15:48:43 INFO DAGScheduler: Parents of final stage: List() 23/04/13 15:48:43 INFO DAGScheduler: Missing parents: List() 23/04/13 15:48:43 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at map at HoodieSparkEngineContext.java:103), which has no missing parents 23/04/13 15:48:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 112.3 KiB, free 912.0 MiB) 23/04/13 15:48:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 42.3 KiB, free 912.0 MiB) 23/04/13 15:48:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-108-166-149.eu-central-1.compute.internal:38925 (size: 42.3 KiB, free: 912.2 MiB) 23/04/13 15:48:43 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1570 23/04/13 15:48:43 INFO DAGScheduler: Submitting 14 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at map at HoodieSparkEngineContext.java:103) (first15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)) 23/04/13 15:48:43 INFO TaskSchedulerImpl: Adding task set 1.0 with 14 tasks resource profile 0 23/04/13 15:48:43 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 4598 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 23/04/13 15:48:43 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 901 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 1, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO Executor: Running task 1.0 in stage 1.0 (TID 2) 23/04/13 15:48:43 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 209 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver)(1/14) 23/04/13 15:48:43 INFO BlockManagerInfo: Removed broadcast_0_piece0 on ip-10-108-166-149.eu-central-1.compute.internal:38925 in memory (size: 42.3 KiB, free: 912.3 MiB) 23/04/13 15:48:43 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 968 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 2, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO Executor: Running task 2.0 in stage 1.0 (TID 3) 23/04/13 15:48:43 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 79 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (2/14) 23/04/13 15:48:43 INFO Executor: Finished task 2.0 in stage 1.0 (TID 3). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 3, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO Executor: Running task 3.0 in stage 1.0 (TID 4) 23/04/13 15:48:43 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 3) in 31 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (3/14) 23/04/13 15:48:43 INFO Executor: Finished task 3.0 in stage 1.0 (TID 4). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 4, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 4) in 26 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (4/14) 23/04/13 15:48:43 INFO Executor: Running task 4.0 in stage 1.0 (TID 5) 23/04/13 15:48:43 INFO Executor: Finished task 4.0 in stage 1.0 (TID 5). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 5, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 5) in 31 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (5/14) 23/04/13 15:48:43 INFO Executor: Running task 5.0 in stage 1.0 (TID 6) 23/04/13 15:48:43 INFO Executor: Finished task 5.0 in stage 1.0 (TID 6). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 7) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 6, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO Executor: Running task 6.0 in stage 1.0 (TID 7) 23/04/13 15:48:43 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 6) in 24 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (6/14) 23/04/13 15:48:43 INFO Executor: Finished task 6.0 in stage 1.0 (TID 7). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID 8) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 7, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 7) in 30 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (7/14) 23/04/13 15:48:43 INFO Executor: Running task 7.0 in stage 1.0 (TID 8) 23/04/13 15:48:43 INFO Executor: Finished task 7.0 in stage 1.0 (TID 8). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 8.0 in stage 1.0 (TID 9) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 8, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO Executor: Running task 8.0 in stage 1.0 (TID 9) 23/04/13 15:48:43 INFO TaskSetManager: Finished task 7.0 in stage 1.0 (TID 8) in 36 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (8/14) 23/04/13 15:48:43 INFO Executor: Finished task 8.0 in stage 1.0 (TID 9). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID 10) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 9, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO Executor: Running task 9.0 in stage 1.0 (TID 10) 23/04/13 15:48:43 INFO TaskSetManager: Finished task 8.0 in stage 1.0 (TID 9) in 29 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (9/14) 23/04/13 15:48:43 INFO Executor: Finished task 9.0 in stage 1.0 (TID 10). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 10.0 in stage 1.0 (TID 11) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition10, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO Executor: Running task 10.0 in stage 1.0 (TID 11) 23/04/13 15:48:43 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 10) in 24 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver)(10/14) 23/04/13 15:48:43 INFO Executor: Finished task 10.0 in stage 1.0 (TID 11). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 11.0 in stage 1.0 (TID 12) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition11, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO TaskSetManager: Finished task 10.0 in stage 1.0 (TID 11) in 24 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (11/14) 23/04/13 15:48:43 INFO Executor: Running task 11.0 in stage 1.0 (TID 12) 23/04/13 15:48:43 INFO Executor: Finished task 11.0 in stage 1.0 (TID 12). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 12.0 in stage 1.0 (TID 13) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition12, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO TaskSetManager: Finished task 11.0 in stage 1.0 (TID 12) in 25 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (12/14) 23/04/13 15:48:43 INFO Executor: Running task 12.0 in stage 1.0 (TID 13) 23/04/13 15:48:43 INFO Executor: Finished task 12.0 in stage 1.0 (TID 13). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Starting task 13.0 in stage 1.0 (TID 14) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition13, PROCESS_LOCAL, 4613 bytes) taskResourceAssignments Map() 23/04/13 15:48:43 INFO TaskSetManager: Finished task 12.0 in stage 1.0 (TID 13) in 25 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (13/14) 23/04/13 15:48:43 INFO Executor: Running task 13.0 in stage 1.0 (TID 14) 23/04/13 15:48:43 INFO Executor: Finished task 13.0 in stage 1.0 (TID 14). 925 bytes result sent to driver 23/04/13 15:48:43 INFO TaskSetManager: Finished task 13.0 in stage 1.0 (TID 14) in 26 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (14/14) 23/04/13 15:48:43 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 23/04/13 15:48:43 INFO DAGScheduler: ResultStage 1 (collect at HoodieSparkEngineContext.java:103) finished in 0.603 s 23/04/13 15:48:43 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job 23/04/13 15:48:43 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished 23/04/13 15:48:43 INFO DAGScheduler: Job 1 finished: collect at HoodieSparkEngineContext.java:103, took 0.614621 s 23/04/13 15:48:44 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:103 23/04/13 15:48:44 INFO DAGScheduler: Got job 2 (collect at HoodieSparkEngineContext.java:103) with 13 output partitions 23/04/13 15:48:44 INFO DAGScheduler: Final stage: ResultStage 2 (collect at HoodieSparkEngineContext.java:103) 23/04/13 15:48:44 INFO DAGScheduler: Parents of final stage: List() 23/04/13 15:48:44 INFO DAGScheduler: Missing parents: List() 23/04/13 15:48:44 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[5] at map at HoodieSparkEngineContext.java:103), which has no missing parents 23/04/13 15:48:44 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 680.3 KiB, free 911.5 MiB) 23/04/13 15:48:44 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 252.0 KiB, free 911.2 MiB) 23/04/13 15:48:44 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ip-10-108-166-149.eu-central-1.compute.internal:38925 (size: 252.0 KiB, free:912.0 MiB) 23/04/13 15:48:44 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1570 23/04/13 15:48:44 INFO DAGScheduler: Submitting 13 missing tasks from ResultStage 2 (MapPartitionsRDD[5] at map at HoodieSparkEngineContext.java:103) (first15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)) 23/04/13 15:48:44 INFO TaskSchedulerImpl: Adding task set 2.0 with 13 tasks resource profile 0 23/04/13 15:48:44 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 15) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:44 INFO Executor: Running task 0.0 in stage 2.0 (TID 15) 23/04/13 15:48:45 INFO BlockManagerInfo: Removed broadcast_1_piece0 on ip-10-108-166-149.eu-central-1.compute.internal:38925 in memory (size: 42.3 KiB, free: 912.1 MiB) 23/04/13 15:48:45 INFO MetricsConfig: Loaded properties from hadoop-metrics2.properties 23/04/13 15:48:45 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 300 second(s). 23/04/13 15:48:45 INFO MetricsSystemImpl: HBase metrics system started 23/04/13 15:48:45 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/files-0000_0-28073-422423_20230411160658276001.hfile' for reading 23/04/13 15:48:46 INFO ZlibFactory: Successfully loaded & initialized native-zlib library 23/04/13 15:48:46 INFO CodecPool: Got brand-new decompressor [.gz] 23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230323153445618.rollback' for reading 23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230324094400516.rollback' for reading 23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230324144931601.rollback' for reading 23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230330105151611.rollback' for reading 23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230330113040739.rollback' for reading 23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405140420990.rollback' for reading 23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405141839408.rollback' for reading 23/04/13 15:48:46 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405144726404.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405151032798.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405152220270.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405152832205.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230405160019842.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230406073630950.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230407140042928.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230407140100444.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230411072649180.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141002691.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/20230413141944579.rollback' for reading 23/04/13 15:48:47 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/.hoodie/hoodie.properties' for reading # WARNING: Unable to attach Serviceability Agent. Unable to attach even with module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.] 23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.1_0-28080-422430' for reading 23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.2_0-28097-422495' for reading 23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.3_0-28214-424972' for reading 23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.4_0-28269-425284' for reading 23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.5_0-118-2431' for reading 23/04/13 15:48:49 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.5_0-120-2432' for reading 23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.6_0-145-2503' for reading 23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.7_0-197-2812' for reading 23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.8_0-323-5348' for reading 23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.9_0-329-5353' for reading 23/04/13 15:48:50 INFO S3NativeFileSystem: Opening 'PATH_TO_S3/hudiTable/.hoodie/metadata/files/.files-0000_20230411160658276001.log.10_0-112-2424' for reading 23/04/13 15:48:51 INFO Executor: Finished task 0.0 in stage 2.0 (TID 15). 955 bytes result sent to driver 23/04/13 15:48:51 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 16) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 1, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:51 INFO Executor: Running task 1.0 in stage 2.0 (TID 16) 23/04/13 15:48:51 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 15) in 6734 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (1/13) 23/04/13 15:48:51 INFO Executor: Finished task 1.0 in stage 2.0 (TID 16). 912 bytes result sent to driver 23/04/13 15:48:51 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 17) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 2, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:51 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 16) in 179 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (2/13) 23/04/13 15:48:51 INFO Executor: Running task 2.0 in stage 2.0 (TID 17) 23/04/13 15:48:51 INFO Executor: Finished task 2.0 in stage 2.0 (TID 17). 912 bytes result sent to driver 23/04/13 15:48:51 INFO TaskSetManager: Starting task 3.0 in stage 2.0 (TID 18) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 3, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:51 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 17) in 168 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (3/13) 23/04/13 15:48:51 INFO Executor: Running task 3.0 in stage 2.0 (TID 18) 23/04/13 15:48:51 INFO Executor: Finished task 3.0 in stage 2.0 (TID 18). 912 bytes result sent to driver 23/04/13 15:48:51 INFO TaskSetManager: Starting task 4.0 in stage 2.0 (TID 19) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 4, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:51 INFO TaskSetManager: Finished task 3.0 in stage 2.0 (TID 18) in 168 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (4/13) 23/04/13 15:48:51 INFO Executor: Running task 4.0 in stage 2.0 (TID 19) 23/04/13 15:48:51 INFO Executor: Finished task 4.0 in stage 2.0 (TID 19). 912 bytes result sent to driver 23/04/13 15:48:51 INFO TaskSetManager: Starting task 5.0 in stage 2.0 (TID 20) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 5, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:51 INFO TaskSetManager: Finished task 4.0 in stage 2.0 (TID 19) in 174 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (5/13) 23/04/13 15:48:51 INFO Executor: Running task 5.0 in stage 2.0 (TID 20) 23/04/13 15:48:51 INFO Executor: Finished task 5.0 in stage 2.0 (TID 20). 912 bytes result sent to driver 23/04/13 15:48:51 INFO TaskSetManager: Starting task 6.0 in stage 2.0 (TID 21) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 6, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:51 INFO TaskSetManager: Finished task 5.0 in stage 2.0 (TID 20) in 154 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (6/13) 23/04/13 15:48:51 INFO Executor: Running task 6.0 in stage 2.0 (TID 21) 23/04/13 15:48:52 INFO Executor: Finished task 6.0 in stage 2.0 (TID 21). 912 bytes result sent to driver 23/04/13 15:48:52 INFO TaskSetManager: Starting task 7.0 in stage 2.0 (TID 22) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 7, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:52 INFO TaskSetManager: Finished task 6.0 in stage 2.0 (TID 21) in 166 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (7/13) 23/04/13 15:48:52 INFO Executor: Running task 7.0 in stage 2.0 (TID 22) 23/04/13 15:48:52 INFO Executor: Finished task 7.0 in stage 2.0 (TID 22). 912 bytes result sent to driver 23/04/13 15:48:52 INFO TaskSetManager: Starting task 8.0 in stage 2.0 (TID 23) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 8, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:52 INFO TaskSetManager: Finished task 7.0 in stage 2.0 (TID 22) in 150 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (8/13) 23/04/13 15:48:52 INFO Executor: Running task 8.0 in stage 2.0 (TID 23) 23/04/13 15:48:52 INFO Executor: Finished task 8.0 in stage 2.0 (TID 23). 912 bytes result sent to driver 23/04/13 15:48:52 INFO TaskSetManager: Starting task 9.0 in stage 2.0 (TID 24) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition 9, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:52 INFO Executor: Running task 9.0 in stage 2.0 (TID 24) 23/04/13 15:48:52 INFO TaskSetManager: Finished task 8.0 in stage 2.0 (TID 23) in 185 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (9/13) 23/04/13 15:48:52 INFO Executor: Finished task 9.0 in stage 2.0 (TID 24). 955 bytes result sent to driver 23/04/13 15:48:52 INFO TaskSetManager: Starting task 10.0 in stage 2.0 (TID 25) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition10, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:52 INFO TaskSetManager: Finished task 9.0 in stage 2.0 (TID 24) in 217 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (10/13) 23/04/13 15:48:52 INFO Executor: Running task 10.0 in stage 2.0 (TID 25) 23/04/13 15:48:52 INFO Executor: Finished task 10.0 in stage 2.0 (TID 25). 912 bytes result sent to driver 23/04/13 15:48:52 INFO TaskSetManager: Starting task 11.0 in stage 2.0 (TID 26) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition11, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:52 INFO TaskSetManager: Finished task 10.0 in stage 2.0 (TID 25) in 166 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (11/13) 23/04/13 15:48:52 INFO Executor: Running task 11.0 in stage 2.0 (TID 26) 23/04/13 15:48:52 INFO Executor: Finished task 11.0 in stage 2.0 (TID 26). 912 bytes result sent to driver 23/04/13 15:48:52 INFO TaskSetManager: Starting task 12.0 in stage 2.0 (TID 27) (ip-10-108-166-149.eu-central-1.compute.internal, executor driver, partition12, PROCESS_LOCAL, 4356 bytes) taskResourceAssignments Map() 23/04/13 15:48:52 INFO Executor: Running task 12.0 in stage 2.0 (TID 27) 23/04/13 15:48:52 INFO TaskSetManager: Finished task 11.0 in stage 2.0 (TID 26) in 152 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (12/13) 23/04/13 15:48:53 INFO Executor: Finished task 12.0 in stage 2.0 (TID 27). 912 bytes result sent to driver 23/04/13 15:48:53 INFO TaskSetManager: Finished task 12.0 in stage 2.0 (TID 27) in 168 ms on ip-10-108-166-149.eu-central-1.compute.internal (executor driver) (13/13) 23/04/13 15:48:53 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 23/04/13 15:48:53 INFO DAGScheduler: ResultStage 2 (collect at HoodieSparkEngineContext.java:103) finished in 8.826 s 23/04/13 15:48:53 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job 23/04/13 15:48:53 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished 23/04/13 15:48:53 INFO DAGScheduler: Job 2 finished: collect at HoodieSparkEngineContext.java:103, took 8.834619 s 23/04/13 15:48:53 INFO SparkUI: Stopped Spark web UI at http://ip-10-108-166-149.eu-central-1.compute.internal:8090 23/04/13 15:48:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/04/13 15:48:53 INFO MemoryStore: MemoryStore cleared 23/04/13 15:48:53 INFO BlockManager: BlockManager stopped 23/04/13 15:48:53 INFO BlockManagerMaster: BlockManagerMaster stopped 23/04/13 15:48:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/04/13 15:48:53 INFO SparkContext: Successfully stopped SparkContext -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
