[
https://issues.apache.org/jira/browse/HIVE-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954328#comment-16954328
]
Ashutosh Bapat commented on HIVE-21114:
---------------------------------------
[~anishek] brought me to this one.
I quickly went through the patch, but haven't looked in detail. It looks like
we will allocate a transaction id for a "read-only" transaction as well but
annotate it as "read-only". Whether a transaction (really a statement) is
read-only is determined by parser/semantic analyzer. According to the SQL
standard queries can have side-effects meaning an apparent SELECT query might
change the db (through a subquery with UPDATE or a procedure/function invoked
by a function running DML etc.). So relying on the parser to decide whether
it's a read-only query may not be the right approach. Since the
parser/semantic analyzer can deem a "read-only" statement as "write" to be on
the safer side OR worst the other way round, which would be buggy. Also letting
a user specify whether a transaction is read-only won't generally help in
multi-statement scenario since the user may want to be on safer side to
accommodate writes in a transaction. A strategy that regards a transaction to
be read-only until a write will work here. The idea is to separate snapshot
from transaction id. Use the first one for reading data and fetch the later one
to track writes when the first write happens. So, if a query never allocated a
write id, it never require to get a transaction id and thus never wrote
anything. If we go that route, we will not create any transaction ids for
read-only transactions and won't have corresponding events. Furthermore that
might help us take out write-ids from the equation altogether.
During my investigation, I had also found another problem. Since the snapshot
and transaction id are tied together right now, a transaction would never
update its snapshot from one statement to the other in a multi-statement
transaction. This means that a true "read committed' multi-statement
transaction won't be possible. To fix that also requires us to separate
snapshot and transaction id.
Please let me know if I am missing something.
> Create read-only transactions
> -----------------------------
>
> Key: HIVE-21114
> URL: https://issues.apache.org/jira/browse/HIVE-21114
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 4.0.0
> Reporter: Eugene Koifman
> Assignee: Denys Kuzmenko
> Priority: Major
> Attachments: HIVE-21114.1.patch, HIVE-21114.2.patch,
> HIVE-21114.3.patch, HIVE-21114.4.patch, HIVE-21114.5.patch,
> HIVE-21114.6.patch, HIVE-21114.7.patch
>
>
> With HIVE-21036 we have a way to indicate that a txn is read only.
> We should (at least in auto-commit mode) determine if the single stmt is a
> read and mark the txn accordingly.
> Then we can optimize {{TxnHandler.commitTxn()}} so that it doesn't do any
> checks in write_set etc.
> {{TxnHandler.commitTxn()}} already starts with {{lockTransactionRecord(stmt,
> txnid, TXN_OPEN)}} so it can read the txn type in the same SQL stmt.
> HiveOperation only has QUERY, which includes Insert and Select, so this
> requires figuring out how to determine if a query is a SELECT. By the time
> {{Driver.openTransaction();}} is called, we have already parsed the query so
> there should be a way to know if the statement only reads.
> For multi-stmt txns (once these are supported) we should allow user to
> indicate that a txn is read-only and then not allow any statements that can
> make modifications in this txn. This should be a different jira.
> cc [~ikryvenko]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)