[ 
https://issues.apache.org/jira/browse/HIVE-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199267#comment-16199267
 ] 

Eugene Koifman commented on HIVE-15267:
---------------------------------------

One of the issues is (not introduced by this patch) is that you cannot split 
NOT IN query into multiple queries.
For example if the input IN list is (5,6) and buildQueryWithINClause() produces 
2 queries
"delete from T where a not in(5)"
"delete from T where a not in(6)"
the net effect will be to delete all rows including those with a = 6 and those 
with a = 5.

Could these be named in a more meaningful way (as opposed (or in addition) to 
comments)?
{noformat}
int i = 0,  // cursor for the "inList" array.
j = 0,  // cursor for an element list per an 'IN'/'NOT IN'-clause
k = 0;  // cursor for in-clause lists per a query
{noformat}

{noformat}
if (newInclausePrefixJustAppended) {
                  buf.delete(buf.length()-newInclausePrefix.length(), 
buf.length());
} 
{noformat}
is problematic if _ maxQueryLength_ is set very low for some reason.  The worst 
case if the query returned doesn't have any IN clause at all, i.e. it would 
look like "delete from T" which will delete everything - this should probably 
throw.
Maybe better to check the size before adding more chars to the query (like it's 
done for each item using _ nextItemNeeded_)




> Make query length calculation logic more accurate in TxnUtils.needNewQuery()
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-15267
>                 URL: https://issues.apache.org/jira/browse/HIVE-15267
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Transactions
>    Affects Versions: 1.2.1, 2.1.0
>            Reporter: Wei Zheng
>            Assignee: Steve Yeom
>         Attachments: HIVE-15267.01.patch, HIVE-15267.02.patch
>
>
> In HIVE-15181 there's such review comment, for which this ticket will handle
> {code}
> in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
> the right thing.
> If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most 
> likely want each SQL string to be at most 1K.
> But if sizeInBytes=2047, this still returns false.
> It should include length of "suffix" in computation of sizeInBytes
> Along the same lines: the check for max query length is done after each batch 
> is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
> IN(...). That's, conservatively, 18KB of text. So the length of each query is 
> increasing in 18KB chunks. 
> I think the check for query length should be done for each item in IN clause.
> If some DB has a limit on query length of X, then any query > X will fail. So 
> I think this must ensure not to produce any queries > X, even by 1 char.
> For example, case 3.1 of the UT generates a query of almost 4000 characters - 
> this is clearly > 1KB.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to