Re: [PR] HIVE-28578: Concurrency issue in updateTableColumnStatistics [hive]

via GitHub Mon, 27 Oct 2025 18:14:29 -0700


dengzhhu653 commented on PR #5929:
URL: https://github.com/apache/hive/pull/5929#issuecomment-3454010825


   > -- MVCC style (optimistic)
   BEGIN;
   SELECT balance FROM accounts WHERE id = 1; -- snapshot read
   -- some processing
   UPDATE accounts SET balance = balance - 100 WHERE id = 1;
   COMMIT;
   -- If another transaction modified same row → commit fails (conflict)
   
   As I explained, this would cause lost-update, 
   After some research, I find it's hard to use DataNucleus without the pure 
"UPDATE" query,
   
         mTable.setLastAccessTime((int) (System.currentTimeMillis()/1000));
         pm.flush(); // here might flush the old MTable into data store
         pm.refresh(mTable);
   In pm.flush() the old state of the table can get overwritten the one in data 
store, resulting to some columns missing in COLUMN_STATS_ACCURATE, for example:
   
{"COLUMN_STATS":{"col_0":"true","col_1":"true","col_2":"true","col_3":"true","col_5":"true","col_6":"true","col_7":"true","col_8":"true"}}
   col_4 and col_9 are missing.
   
   > Every iceberg table commits involves alter table operation and it's 
non-blocking ATM.
   
   The Iceberg commit relies on the DB transaction atomicity, it should involve 
the row lock behind the scenes, though the lock is quite small(TBL_ID, 
PARAM_KEY), if the table has multiple commits at the same time, only one is 
allowed to alter the `TABLE_PARAMS`.
   
   > That’s the opposite of what CU experienced on a highly loaded MySQL 
cluster with S4U on the NEXT_TXN_ID table.
   
   This is because every HMS request needs to lock only one `NEXT_TXN_ID`,  
compared the same level lock distributed among different tables
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-28578: Concurrency issue in updateTableColumnStatistics [hive]

Reply via email to