Csaba Ringhofer created HIVE-22565:
--------------------------------------
Summary: Make calling alter_table unnecessary during inserts
Key: HIVE-22565
URL: https://issues.apache.org/jira/browse/HIVE-22565
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Csaba Ringhofer
tl dr: it would be good to set the table's writeId during commit to make the
extra alter_table call unnecessary
This came up during the implementation of (insert_only) ACID inserts in Apache
Impala.
The following description deals with the non-partitioned case, partitioned
tables are a bit more complicated.
apply_table is called by Impala during inserts mainly to set stats to
non-accurate:
- the table's writeId is set to the writeId of the insert
- remove table property column_stats_accurate
In the past we had the false assumption that setting the writeId is done
automatically by committing the transaction. It would be nice to have a version
of commit that actually does this - commits the transaction + changes the
writeId/marks stats as inaccurate in a single atomic step.
The current state of alter_table + commit being non-atomic can lead to weird
scenarios in parallel inserts(+ computes stats).
Impala calls apply_table before commit, so the calls to HMS during inserts look
like this:
1. open new transaction
2. get shared lock on the table
3. get write id
... write the files ...
4. call alter_table to remove column_stats_accurate (this also sets writeId)
5. commit the transaction
So the following can occur with two parallel writes + a compute stats:
1. txn 1 calls alter_table (sets to writeId of txn 1)
2. txn 2 calls alter_table (sets to writeId of txn 2)
3. txn 2 is committed
4. compute stats runs (gets validWriteList, reads the table, sets the stats
with alter_table)
5. txn 1 is committed
The compute stats will have the writeId of txn 2 in it's validWriteId list, so
it will assume that it computed accurate stats. After step 5. the stats will be
considered accurate while they do not contain the new rows from txn 1.
Another issue with frequent alter_table calls is that the effect of actual
ALTER TABLE commands that use shared locks (I think SET TBLPROPERTIES does this
in Hive) can be simply overwritten by alter_table calls from inserts that used
a different cached version of the table. This is generally a problem if ALTER
TABLE is called from different clients (without taking exclusive lock), but
doing parallel DMLs is probably more common than doing parallel DDLs.
So issues can occur even if clients use the API correctly - another problem is
that the hard to use API may lead to buggy client implementation that can
easily mess up things for other components too.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)