maheshguptags opened a new issue, #6903:
URL: https://github.com/apache/hudi/issues/6903
Hi Team,
**I am trying to perform offline compaction using hudi MOR table using
spark.**
I am trying to perform offline compaction using Hudi MOR table using spark.
for that I have setup in-line schedule using spark code and for execution I am
using the HoodieCompactor class.
**To Reproduce**
Steps to reproduce the behaviour:
1.Scheduling configuration
``` hudi_options_write = {
'hoodie.datasource.write.table.type' : 'MERGE_ON_READ',
'hoodie.datasource.write.recordkey.field': 'a,b,c',
'hoodie.table.name': tableName,
'hoodie.datasource.write.hive_style_partitioning':'false',
'hoodie.archivelog.folder':'archived',
'hoodie.datasource.write.operation': 'upsert',
'hoodie.datasource.write.partitionpath.field': 'a',
'hoodie.datasource.write.keygenerator.class':
'org.apache.hudi.keygen.ComplexKeyGenerator', ## to allow the multiple key
'hoodie.datasource.write.partitionpath.urlencode':'false',
'hoodie.upsert.shuffle.parallelism': 2,
'hoodie.timeline.layout.version':1,
'hoodie.datasource.write.precombine.field': 'b' ,
'hoodie.compact.inline': 'false',
'hoodie.datasource.compaction.async.enable':'false',
'hoodie.compact.schedule.inline': 'true',
'hoodie.compact.inline.max.delta.commits':5,
'hoodie.table.timeline.timezone':'utc'
}
```
2. execution using below class
``` spark-submit --class org.apache.hudi.utilities.HoodieCompactor --jars
/usr/lib/hudi/hudi-spark3-bundle_2.12-0.10.1-amzn-0.jar
/usr/lib/hudi/hudi-utilities-bundle_2.12-0.10.1-amzn-0.jar --base-path
"s3://test-spark-hudi/test_campaign_event_offline_compact_v1/" --table-name
"customer_event_offline_v1" --schema-file
"s3://test-spark-hudi/schema/offline_compact.avsc" --schedule --strategy
"org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy"
--instant-time "20221007120816651" --spark-memory 1g --parallelism 2 ```
**Expected behavior**
the outcome of scheduling code is that the spark code must generate the
`compact.requested` file after every five delta log commit as per the default
behaviour. but it is not generating.
Secondly when I try to run the scheduling from **`hudi-cli>`** using
`compaction schedule` so its behaviour is random(sometime works and sometime
doesn't) not sure why? I have also attached the stack trace for same.
**Environment Description**
* EMR Version : emr-6.6.0
* Hudi version : 10.1 & 11(tried on both)
* Spark version : Spark 3.2.0-amzn-0
* Hive version : Hive 3.1.2
* Hadoop version : Hadoop Amazon 3.2.1
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : NO
**Additional context**
**Stacktrace**
```compaction schedule
22/10/10 05:39:45 INFO SparkContext: Running Spark version 3.2.0-amzn-0
22/10/10 05:39:45 INFO ResourceUtils:
==============================================================
22/10/10 05:39:45 INFO ResourceUtils: No custom resources configured for
spark.driver.
22/10/10 05:39:45 INFO ResourceUtils:
==============================================================
22/10/10 05:39:45 INFO SparkContext: Submitted application:
hoodie-cli-COMPACT_SCHEDULE
22/10/10 05:39:45 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: ,
memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name:
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name:
cpus, amount: 1.0)
22/10/10 05:39:45 INFO ResourceProfile: Limiting resource is cpus at 4 tasks
per executor
22/10/10 05:39:45 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/10/10 05:39:45 INFO SecurityManager: Changing view acls to: hadoop
22/10/10 05:39:45 INFO SecurityManager: Changing modify acls to: hadoop
22/10/10 05:39:45 INFO SecurityManager: Changing view acls groups to:
22/10/10 05:39:45 INFO SecurityManager: Changing modify acls groups to:
22/10/10 05:39:45 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(hadoop); groups
with view permissions: Set(); users with modify permissions: Set(hadoop);
groups with modify permissions: Set()
22/10/10 05:39:45 INFO deprecation: mapred.output.compression.codec is
deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
22/10/10 05:39:45 INFO deprecation: mapred.output.compression.type is
deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
22/10/10 05:39:45 INFO deprecation: mapred.output.compress is deprecated.
Instead, use mapreduce.output.fileoutputformat.compress
22/10/10 05:39:45 INFO Utils: Successfully started service 'sparkDriver' on
port 37043.
22/10/10 05:39:45 INFO SparkEnv: Registering MapOutputTracker
22/10/10 05:39:45 INFO SparkEnv: Registering BlockManagerMaster
22/10/10 05:39:45 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/10/10 05:39:45 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
22/10/10 05:39:45 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/10/10 05:39:45 INFO DiskBlockManager: Created local directory at
/mnt/tmp/blockmgr-a788d127-7fc5-4af7-99c1-3867375f3887
22/10/10 05:39:45 INFO MemoryStore: MemoryStore started with capacity 912.3
MiB
22/10/10 05:39:45 INFO SparkEnv: Registering OutputCommitCoordinator
22/10/10 05:39:45 INFO SubResultCacheManager: Sub-result caches are disabled.
22/10/10 05:39:45 INFO log: Logging initialized @2490ms to
org.sparkproject.jetty.util.log.Slf4jLog
22/10/10 05:39:45 INFO Server: jetty-9.4.43.v20210629; built:
2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm
1.8.0_342-b07
22/10/10 05:39:45 INFO Server: Started @2596ms
22/10/10 05:39:46 INFO AbstractConnector: Started
ServerConnector@ecfbe91{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
22/10/10 05:39:46 INFO Utils: Successfully started service 'SparkUI' on port
4040.
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@5ac7aa18{/jobs,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@13047d7d{/jobs/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@65bb9029{/jobs/job,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@49601f82{/jobs/job/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@2b8d084{/stages,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@24fabd0f{/stages/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@61f3fbb8{/stages/stage,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@60e5272{/stages/stage/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@69c93ca4{/stages/pool,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@173373b4{/stages/pool/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@60dd3c23{/storage,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@5e9456ae{/storage/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@1f1cae23{/storage/rdd,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@985696{/storage/rdd/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@215a34b4{/environment,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@35d3ab60{/environment/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@71870da7{/executors,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@45792847{/executors/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@4e25147a{/executors/threadDump,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@675ffd1d{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@30506c0d{/static,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@771db12c{/,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@26ae880a{/api,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@5c645b43{/jobs/job/kill,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@298d9a05{/stages/stage/kill,null,AVAILABLE,@Spark}
22/10/10 05:39:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://ip-10-224-51-45.ap-south-1.compute.internal:4040
22/10/10 05:39:46 INFO SparkContext: Added JAR
file:/usr/lib/hudi/cli/hudi-cli-0.10.1-amzn-0.jar at
spark://ip-10-224-51-45.ap-south-1.compute.internal:37043/jars/hudi-cli-0.10.1-amzn-0.jar
with timestamp 1665380385102
22/10/10 05:39:46 INFO Executor: Starting executor ID driver on host
ip-10-224-51-45.ap-south-1.compute.internal
22/10/10 05:39:46 INFO Executor: Fetching
spark://ip-10-224-51-45.ap-south-1.compute.internal:37043/jars/hudi-cli-0.10.1-amzn-0.jar
with timestamp 1665380385102
22/10/10 05:39:46 INFO TransportClientFactory: Successfully created
connection to ip-10-224-51-45.ap-south-1.compute.internal/10.224.51.45:37043
after 29 ms (0 ms spent in bootstraps)
22/10/10 05:39:46 INFO Utils: Fetching
spark://ip-10-224-51-45.ap-south-1.compute.internal:37043/jars/hudi-cli-0.10.1-amzn-0.jar
to
/mnt/tmp/spark-52a5a695-a32e-4d74-bf11-83563425004c/userFiles-7dc5a66f-473c-4f03-bf5a-906c46e504d5/fetchFileTemp7148092357809489685.tmp
22/10/10 05:39:46 INFO Executor: Adding
file:/mnt/tmp/spark-52a5a695-a32e-4d74-bf11-83563425004c/userFiles-7dc5a66f-473c-4f03-bf5a-906c46e504d5/hudi-cli-0.10.1-amzn-0.jar
to class loader
22/10/10 05:39:46 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 43033.
22/10/10 05:39:46 INFO NettyBlockTransferService: Server created on
ip-10-224-51-45.ap-south-1.compute.internal:43033
22/10/10 05:39:46 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
policy
22/10/10 05:39:46 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, ip-10-224-51-45.ap-south-1.compute.internal, 43033, None)
22/10/10 05:39:46 INFO BlockManagerMasterEndpoint: Registering block manager
ip-10-224-51-45.ap-south-1.compute.internal:43033 with 912.3 MiB RAM,
BlockManagerId(driver, ip-10-224-51-45.ap-south-1.compute.internal, 43033, None)
22/10/10 05:39:46 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, ip-10-224-51-45.ap-south-1.compute.internal, 43033, None)
22/10/10 05:39:46 INFO BlockManager: external shuffle service port = 7337
22/10/10 05:39:46 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver, ip-10-224-51-45.ap-south-1.compute.internal, 43033, None)
22/10/10 05:39:46 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@75a118e6{/metrics/json,null,AVAILABLE,@Spark}
22/10/10 05:39:47 INFO ClientConfigurationFactory: Set initial getObject
socket timeout to 2000 ms.
22/10/10 05:39:47 INFO log: Logging initialized @4588ms to
org.eclipse.jetty.util.log.Slf4jLog
22/10/10 05:39:48 INFO Javalin:
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
hudi:customer_event_offline_v1->
https://javalin.io/documentation
hudi:customer_event_offline_v1->
22/10/10 05:39:48 INFO Javalin: Starting Javalin ...
22/10/10 05:39:48 INFO Server: jetty-9.4.43.v20210629; built:
2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm
1.8.0_342-b07
22/10/10 05:39:48 INFO Server: Started @5001ms
22/10/10 05:39:48 INFO Javalin: Listening on http://localhost:36465/
22/10/10 05:39:48 INFO Javalin: Javalin started in 180ms \o/
22/10/10 05:39:49 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/test_campaign_event_offline_compact_v1/.hoodie/hoodie.properties'
for reading
22/10/10 05:39:49 INFO AbstractConnector: Stopped Spark@ecfbe91{HTTP/1.1,
(http/1.1)}{0.0.0.0:4040}
22/10/10 05:39:49 INFO SparkUI: Stopped Spark web UI at
http://ip-10-224-51-45.ap-south-1.compute.internal:4040
22/10/10 05:39:49 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
22/10/10 05:39:49 INFO MemoryStore: MemoryStore cleared
22/10/10 05:39:49 INFO BlockManager: BlockManager stopped
22/10/10 05:39:49 INFO BlockManagerMaster: BlockManagerMaster stopped
22/10/10 05:39:49 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
22/10/10 05:39:50 INFO SparkContext: Successfully stopped SparkContext
22/10/10 05:39:50 INFO ShutdownHookManager: Shutdown hook called
22/10/10 05:39:50 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-fe542e3b-8ab1-468a-b8af-cfa58eef245c
22/10/10 05:39:50 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-52a5a695-a32e-4d74-bf11-83563425004c
**Attempted to schedule compaction for 20221010053942582**
hudi:customer_event_offline_v1->compaction run --parallelism 2
--schemaFilePath "s3://test-spark-hudi/schema/offline_compact.avsc"
--compactionInstant 20221010053942582
22/10/10 05:40:14 INFO SparkContext: Running Spark version 3.2.0-amzn-0
22/10/10 05:40:14 INFO ResourceUtils:
==============================================================
22/10/10 05:40:14 INFO ResourceUtils: No custom resources configured for
spark.driver.
22/10/10 05:40:14 INFO ResourceUtils:
==============================================================
22/10/10 05:40:14 INFO SparkContext: Submitted application:
hoodie-cli-COMPACT_RUN
22/10/10 05:40:14 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: ,
memory -> name: memory, amount: 4096, script: , vendor: , offHeap -> name:
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name:
cpus, amount: 1.0)
22/10/10 05:40:14 INFO ResourceProfile: Limiting resource is cpus at 4 tasks
per executor
22/10/10 05:40:14 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/10/10 05:40:14 INFO SecurityManager: Changing view acls to: hadoop
22/10/10 05:40:14 INFO SecurityManager: Changing modify acls to: hadoop
22/10/10 05:40:14 INFO SecurityManager: Changing view acls groups to:
22/10/10 05:40:14 INFO SecurityManager: Changing modify acls groups to:
22/10/10 05:40:14 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(hadoop); groups
with view permissions: Set(); users with modify permissions: Set(hadoop);
groups with modify permissions: Set()
22/10/10 05:40:14 INFO deprecation: mapred.output.compression.codec is
deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
22/10/10 05:40:14 INFO deprecation: mapred.output.compression.type is
deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
22/10/10 05:40:14 INFO deprecation: mapred.output.compress is deprecated.
Instead, use mapreduce.output.fileoutputformat.compress
22/10/10 05:40:14 INFO Utils: Successfully started service 'sparkDriver' on
port 34943.
22/10/10 05:40:14 INFO SparkEnv: Registering MapOutputTracker
22/10/10 05:40:14 INFO SparkEnv: Registering BlockManagerMaster
22/10/10 05:40:14 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/10/10 05:40:14 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
22/10/10 05:40:14 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/10/10 05:40:14 INFO DiskBlockManager: Created local directory at
/mnt/tmp/blockmgr-a0fd4166-022f-44a5-9709-33e6e390c281
22/10/10 05:40:14 INFO MemoryStore: MemoryStore started with capacity 912.3
MiB
22/10/10 05:40:14 INFO SparkEnv: Registering OutputCommitCoordinator
22/10/10 05:40:14 INFO SubResultCacheManager: Sub-result caches are disabled.
22/10/10 05:40:14 INFO log: Logging initialized @2733ms to
org.sparkproject.jetty.util.log.Slf4jLog
22/10/10 05:40:14 INFO Server: jetty-9.4.43.v20210629; built:
2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm
1.8.0_342-b07
22/10/10 05:40:14 INFO Server: Started @2841ms
22/10/10 05:40:14 INFO AbstractConnector: Started
ServerConnector@150466c4{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
22/10/10 05:40:14 INFO Utils: Successfully started service 'SparkUI' on port
4040.
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@566d0c69{/jobs,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@bdc8014{/jobs/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@73ba6fe6{/jobs/job,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@87abc48{/jobs/job/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@782168b7{/stages,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@7435a578{/stages/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@13047d7d{/stages/stage,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@2b214b94{/stages/stage/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@49601f82{/stages/pool,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@2b8d084{/stages/pool/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@24fabd0f{/storage,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@61f3fbb8{/storage/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@432034a{/storage/rdd,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@60e5272{/storage/rdd/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@69c93ca4{/environment,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@173373b4{/environment/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@60dd3c23{/executors,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@5e9456ae{/executors/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@1f1cae23{/executors/threadDump,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@985696{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@215a34b4{/static,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@14fc5d40{/,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@47d7bfb3{/api,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@5f13be1{/jobs/job/kill,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@50d3bf39{/stages/stage/kill,null,AVAILABLE,@Spark}
22/10/10 05:40:15 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://ip-10-224-51-45.ap-south-1.compute.internal:4040
22/10/10 05:40:15 INFO SparkContext: Added JAR
file:/usr/lib/hudi/cli/hudi-cli-0.10.1-amzn-0.jar at
spark://ip-10-224-51-45.ap-south-1.compute.internal:34943/jars/hudi-cli-0.10.1-amzn-0.jar
with timestamp 1665380413987
22/10/10 05:40:15 INFO Executor: Starting executor ID driver on host
ip-10-224-51-45.ap-south-1.compute.internal
22/10/10 05:40:15 INFO Executor: Fetching
spark://ip-10-224-51-45.ap-south-1.compute.internal:34943/jars/hudi-cli-0.10.1-amzn-0.jar
with timestamp 1665380413987
22/10/10 05:40:15 INFO TransportClientFactory: Successfully created
connection to ip-10-224-51-45.ap-south-1.compute.internal/10.224.51.45:34943
after 29 ms (0 ms spent in bootstraps)
22/10/10 05:40:15 INFO Utils: Fetching
spark://ip-10-224-51-45.ap-south-1.compute.internal:34943/jars/hudi-cli-0.10.1-amzn-0.jar
to
/mnt/tmp/spark-1a761ac7-6903-41e9-8c8b-8d591f36d810/userFiles-1cc44228-430f-4503-a49c-77d959ead06b/fetchFileTemp3032838916835851411.tmp
22/10/10 05:40:15 INFO Executor: Adding
file:/mnt/tmp/spark-1a761ac7-6903-41e9-8c8b-8d591f36d810/userFiles-1cc44228-430f-4503-a49c-77d959ead06b/hudi-cli-0.10.1-amzn-0.jar
to class loader
22/10/10 05:40:15 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 46483.
22/10/10 05:40:15 INFO NettyBlockTransferService: Server created on
ip-10-224-51-45.ap-south-1.compute.internal:46483
22/10/10 05:40:15 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
policy
22/10/10 05:40:15 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, ip-10-224-51-45.ap-south-1.compute.internal, 46483, None)
22/10/10 05:40:15 INFO BlockManagerMasterEndpoint: Registering block manager
ip-10-224-51-45.ap-south-1.compute.internal:46483 with 912.3 MiB RAM,
BlockManagerId(driver, ip-10-224-51-45.ap-south-1.compute.internal, 46483, None)
22/10/10 05:40:15 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, ip-10-224-51-45.ap-south-1.compute.internal, 46483, None)
22/10/10 05:40:15 INFO BlockManager: external shuffle service port = 7337
22/10/10 05:40:15 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver, ip-10-224-51-45.ap-south-1.compute.internal, 46483, None)
22/10/10 05:40:15 INFO ContextHandler: Started
o.s.j.s.ServletContextHandler@61d84e08{/metrics/json,null,AVAILABLE,@Spark}
22/10/10 05:40:16 INFO ClientConfigurationFactory: Set initial getObject
socket timeout to 2000 ms.
22/10/10 05:40:17 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/schema/offline_compact.avsc' for reading
22/10/10 05:40:17 INFO log: Logging initialized @5802ms to
org.eclipse.jetty.util.log.Slf4jLog
22/10/10 05:40:17 INFO Javalin:
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
hudi:customer_event_offline_v1->
https://javalin.io/documentation
hudi:customer_event_offline_v1->
22/10/10 05:40:17 INFO Javalin: Starting Javalin ...
22/10/10 05:40:18 INFO Server: jetty-9.4.43.v20210629; built:
2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm
1.8.0_342-b07
22/10/10 05:40:18 INFO Server: Started @6068ms
22/10/10 05:40:18 INFO Javalin: Listening on http://localhost:42971/
22/10/10 05:40:18 INFO Javalin: Javalin started in 162ms \o/
22/10/10 05:40:18 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/test_campaign_event_offline_compact_v1/.hoodie/hoodie.properties'
for reading
22/10/10 05:40:18 INFO S3NativeFileSystem: Opening
's3://test-spark-hudi/test_campaign_event_offline_compact_v1/.hoodie/20221007132341606.deltacommit'
for reading
22/10/10 05:40:19 ERROR UtilHelpers: Compact failed
**java.lang.IllegalStateException: No Compaction request available at
20221010053942582 to run compaction
hudi:cusat**
org.apache.hudi.table.action.compact.HoodieSparkMergeOnReadTableCompactor.preCompact(HoodieSparkMergeOnReadTableCompactor.java:49)
hudi:cusat
org.apache.hudi.table.action.compact.RunCompactionActionExecutor.execute(RunCompactionActionExecutor.java:64)
hudi:cusat
org.apache.hudi.table.HoodieSparkMergeOnReadTable.compact(HoodieSparkMergeOnReadTable.java:143)
hudi:cusat
org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:341)
hudi:cusat
org.apache.hudi.client.SparkRDDWriteClient.compact(SparkRDDWriteClient.java:75)
hudi:cusat
org.apache.hudi.client.AbstractHoodieWriteClient.compact(AbstractHoodieWriteClient.java:860)
hudi:cusat
org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:156)
hudi:cusat
org.apache.hudi.utilities.HoodieCompactor.lambda$compact$0(HoodieCompactor.java:130)
hudi:cusat org.apache.hudi.utilities.UtilHelpers.retry(UtilHelpers.java:488)
hudi:cusat
org.apache.hudi.utilities.HoodieCompactor.compact(HoodieCompactor.java:123)
hudi:cusat org.apache.hudi.cli.commands.SparkMain.compact(SparkMain.java:336)
hudi:cusat org.apache.hudi.cli.commands.SparkMain.main(SparkMain.java:130)
hudi:cusat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
hudi:cusat
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
hudi:cusat
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
hudi:cusat java.lang.reflect.Method.invoke(Method.java:498)
hudi:cusat
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
hudi:cusat
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000)
hudi:cusat
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
hudi:cusat org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
hudi:cusat org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
hudi:cusat
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089)
hudi:cusat org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098)
hudi:cusat org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/10/10 05:40:19 INFO AbstractConnector: Stopped Spark@150466c4{HTTP/1.1,
(http/1.1)}{0.0.0.0:4040}
22/10/10 05:40:19 INFO SparkUI: Stopped Spark web UI at
http://ip-10-224-51-45.ap-south-1.compute.internal:4040
22/10/10 05:40:19 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
22/10/10 05:40:19 INFO MemoryStore: MemoryStore cleared
22/10/10 05:40:19 INFO BlockManager: BlockManager stopped
22/10/10 05:40:19 INFO BlockManagerMaster: BlockManagerMaster stopped
22/10/10 05:40:19 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
22/10/10 05:40:19 INFO SparkContext: Successfully stopped SparkContext
22/10/10 05:40:19 INFO ShutdownHookManager: Shutdown hook called
22/10/10 05:40:19 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-1a761ac7-6903-41e9-8c8b-8d591f36d810
22/10/10 05:40:19 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-8c125f58-c509-466f-9c47-e2d2a4718540
Failed to run compaction for 20221010053942582
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]