[jira] [Created] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

Wei Zheng (JIRA) Tue, 22 Nov 2016 13:16:12 -0800

Wei Zheng created HIVE-15267:
--------------------------------

             Summary: Make query length calculation logic more accurate in 
TxnUtils.needNewQuery()
                 Key: HIVE-15267
                 URL: https://issues.apache.org/jira/browse/HIVE-15267
             Project: Hive
          Issue Type: Bug
          Components: Hive, Transactions
    Affects Versions: 2.1.0, 1.2.1
            Reporter: Wei Zheng
            Assignee: Wei Zheng



In HIVE-15181 there's such review comment, for which this ticket will handle
{code}
in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
the right thing.
If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most likely 
want each SQL string to be at most 1K.
But if sizeInBytes=2047, this still returns false.
It should include length of "suffix" in computation of sizeInBytes
Along the same lines: the check for max query length is done after each batch 
is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
IN(...). That's, conservatively, 18KB of text. So the length of each query is 
increasing in 18KB chunks. 
I think the check for query length should be done for each item in IN clause.
If some DB has a limit on query length of X, then any query > X will fail. So I 
think this must ensure not to produce any queries > X, even by 1 char.
For example, case 3.1 of the UT generates a query of almost 4000 characters - 
this is clearly > 1KB.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

Reply via email to