hanson2021 opened a new issue, #10538:
URL: https://github.com/apache/hudi/issues/10538

   
   (1)**Environment Description**
   * Hudi version : 0.12.0
   * Flink version : 1.14.4
   * Hive version : 3.1.2
   * Hadoop version : 3.1.2
   * Storage (HDFS/S3/GCS..) : HDFS
   * Running on Docker? (yes/no) : no
   
   (2)My hudi sink table created as following:
     CREATE TABLE bidwhive.birt_test.ods_thasu_short_url_log_inc_test(
     uuid string PRIMARY KEY NOT ENFORCED,
    `binlog_id` bigint  comment '日志id',
    `id` bigint  comment '主键',
    `short_id` bigint  comment '短链id',
    `short_key` string  comment '短链后缀',
    `short_base_key` string  comment '短链表basekey',
    `user_agent` string  comment 'ua',
    `user_ip` string  comment '用户ip',
    `created_at` string  comment '创建时间',
    `cdc_type` string  comment 'CDC类型',
    `es` string   comment 'binlog执行的时间',
    `ts` string  comment 'mario 获取到该数据将要发送到队列的时间',
    `etl_time` string  comment 'etl同步时间',
    `dt` string  comment '分区字段' 
    ) PARTITIONED BY (`dt`) WITH (
   'connector' = 'hudi',
    'path' = 
'hdfs://ahdpns/user/hive/warehouse/bi_test.db/ods_thasu_short_url_log_inc_test',
    'table.type' = 'COPY_ON_WRITE',
    'hoodie.datasource.write.recordkey.field' = 'uuid',
    'write.precombine.field' = 'created_at',
    'index.state.ttl' = '0.0',
    'index.type' = 'FLINK_STATE',
    'write.operation' = 'insert',
    'write.task.max.size' = '1024',
    'write.tasks' = '12',
    'hoodie.parquet.compression.codec' = 'snappy',
    'hoodie.cleaner.policy.failed.writes'='LAZY',
    'hoodie.write.concurrency.mode' = 'optimistic_concurrency_control',
    'hoodie.write.lock.wait_time_ms'='120000',
   --  'hoodie.write.lock.provider' = 
'org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider',
   --  'hoodie.write.lock.zookeeper.url'='host01:2181,host02:2181,host03:2181',
   --  'hoodie.write.lock.zookeeper.port'='2181',
   --  'hoodie.write.lock.zookeeper.base_path'='/huditest',
   --  'hoodie.write.lock.zookeeper.lock_key'='test',
   --  'hoodie.write.lock.zookeeper.session_timeout_ms'='60000',
   --  
'hoodie.write.lock.provider'='org.apache.hudi.hive.transaction.lock.HiveMetastoreBasedLockProvider',
   --  
'hoodie.write.lock.hivemetastore.uris'='thrift://host02:9083,thrift://host01:9083,thrift://host03:9083',
   --  'hoodie.write.lock.hivemetastore.database'='bi_test',
   --  'hoodie.write.lock.hivemetastore.table'='testlock',
    
'hoodie.write.lock.provider'='org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider',
   -- 
'hoodie.write.lock.filesystem.path'='hdfs://ahdpns/user/hive/warehouse/bi_test.db/ods_thasu_short_url_log_inc_test',
 -- default:hoodie.base.path+/.hoodie/lock
    'hoodie.write.lock.filesystem.expire'='0',
    'hive_sync.enable' = 'true',
    'hive_sync.mode' = 'hms',
    'hive_sync.metastore.uris' = 
'thrift://host02:9083,thrift://host03:9083,thrift://host01:9083',
    'hive_sync.table' = 'ods_thasu_short_url_log_inc_test',
    'hive_sync.db' = 'bi_test') 
    ;
   
   
   
   (3) When I select FileSystemBasedLockProvider, the exception logs as 
following: 
   2024-01-19 17:20:25,056 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Checkpoint 2 
of job 53c0175fec27445402621d4a05d5d797 expired before completing.
   2024-01-19 17:22:37,314 ERROR 
org.apache.hudi.sink.StreamWriteOperatorCoordinator          [] - Executor 
executes action [commits the instant 20240119171545400] error
   org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock 
object 
hdfs://ahdpns/user/hive/warehouse/bi_test.db/ods_thasu_short_url_log_inc_test/.hoodie/.aux/lock
        at 
org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:87) 
~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:53)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:232)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.HoodieFlinkWriteClient.commit(HoodieFlinkWriteClient.java:117)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.doCommit(StreamWriteOperatorCoordinator.java:530)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.commitInstant(StreamWriteOperatorCoordinator.java:506)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2(StreamWriteOperatorCoordinator.java:242)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_191]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_191]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
   2024-01-19 17:22:37,330 INFO  
org.apache.hudi.sink.StreamWriteOperatorCoordinator          [] - Executor 
executes action [taking checkpoint 2] success!
   2024-01-19 17:22:37,342 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Trying to recover from a global failure.
   org.apache.flink.util.FlinkException: Global failure triggered by 
OperatorCoordinator for 'hoodie_append_write: ods_thasu_short_url_log_inc_test' 
(operator e9bb29a2d1826b2ea3ef409fecfcbfde).
        at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:545)
 ~[flink-dist_2.12-1.14.4.jar:1.14.4]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$start$0(StreamWriteOperatorCoordinator.java:187)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.handleException(NonThrownExecutor.java:146)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:133)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_191]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_191]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_191]
   Caused by: org.apache.hudi.exception.HoodieException: Executor executes 
action [commits the instant 20240119171545400] error
        ... 6 more
   Caused by: org.apache.hudi.exception.HoodieLockException: Unable to acquire 
lock, lock object 
hdfs://ahdpns/user/hive/warehouse/bi_test.db/ods_thasu_short_url_log_inc_test/.hoodie/.aux/lock
        at 
org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:87) 
~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:53)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:232)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.HoodieFlinkWriteClient.commit(HoodieFlinkWriteClient.java:117)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.doCommit(StreamWriteOperatorCoordinator.java:530)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.commitInstant(StreamWriteOperatorCoordinator.java:506)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2(StreamWriteOperatorCoordinator.java:242)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
   
   (4)When I select ZookeeperBasedLockProvider, the exception logs as 
following: 
   org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock 
object 
        at 
org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:75) 
~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.HoodieFlinkWriteClient.writeTableMetadata(HoodieFlinkWriteClient.java:272)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:271)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:236)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.HoodieFlinkWriteClient.commit(HoodieFlinkWriteClient.java:117)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.doCommit(StreamWriteOperatorCoordinator.java:530)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.commitInstant(StreamWriteOperatorCoordinator.java:506)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2(StreamWriteOperatorCoordinator.java:242)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_191]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_191]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
   Caused by: org.apache.hudi.exception.HoodieLockException: FAILED_TO_ACQUIRE 
lock atZkBasePath = /huditest, lock key = test
        at 
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider.tryLock(ZookeeperBasedLockProvider.java:101)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:67) 
~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        ... 11 more
   Caused by: java.lang.IllegalArgumentException: ALREADY_ACQUIRED lock 
atZkBasePath = /huditest, lock key = test
        at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider.acquireLock(ZookeeperBasedLockProvider.java:140)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider.tryLock(ZookeeperBasedLockProvider.java:96)
 ~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
        at 
org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:67) 
~[hudi-flink1.14-bundle-0.12.0.jar:0.12.0]
   
   (5) Can anybody help me to solve it?  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to