[ 
https://issues.apache.org/jira/browse/HIVE-20664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638970#comment-16638970
 ] 

Eugene Koifman commented on HIVE-20664:
---------------------------------------

+1 on patch 2

regarding tests, take a look at ORC-228.  The idea was precisely to enable 
ability to create small stripes for tests.
Hive is now on Orc 1.5.3.

Let me know if you want to do that as part of this ticket or I can commit this 
and you can do that in a follow up jira

> Potential ArrayIndexOutOfBoundsException in 
> VectorizedOrcAcidRowBatchReader.findMinMaxKeys
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20664
>                 URL: https://issues.apache.org/jira/browse/HIVE-20664
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Saurabh Seth
>            Assignee: Saurabh Seth
>            Priority: Minor
>         Attachments: HIVE-20664.2.patch, HIVE-20664.patch
>
>
> [~ekoifman], could you please confirm if my understanding is correct and if 
> so, review the fix?
> In the method {{VectorizedOrcAcidRowBatchReader.findMinMaxKeys}}, the code 
> snippet that identifies the first and last stripe indices in the current 
> split could result in an ArrayIndexOutOfBoundsException if a complete split 
> is within the same stripe:
> {noformat}
>     for(int i = 0; i < stripes.size(); i++) {
>       StripeInformation stripe = stripes.get(i);
>       long stripeEnd = stripe.getOffset() + stripe.getLength();
>       if(firstStripeIndex == -1 && stripe.getOffset() >= splitStart) {
>         firstStripeIndex = i;
>       }
>       if(lastStripeIndex == -1 && splitEnd <= stripeEnd &&
>           stripes.get(firstStripeIndex).getOffset() <= stripe.getOffset() ) {
>         //the last condition is for when both splitStart and splitEnd are in
>         // the same stripe
>         lastStripeIndex = i;
>       }
>     }
> {noformat}
> Consider the example where there are 2 stripes - 0-500 and 500-1000 and 
> splitStart is 600 and splitEnd is 800.
> In the first iteration of the loop, stripe.getOffset() is 0 and stripeEnd is 
> 500. In this iteration, neither of the if statement conditions will be met 
> and firstSripeIndex as well as lastStripeIndex remain -1.
> In the second iteration of the loop stripe.getOffset() is 500, stripeEnd is 
> 1000, The first if statement condition will not be met in this case because 
> stripe's offset (500) is not greater than or equal to the splitStart (600). 
> However, in the second if statement, splitEnd (800) is <= stripeEnd(1000) and 
> it will try to compute the last condition 
> {{stripes.get(firstStripeIndex).getOffset() <= stripe.getOffset()}}. This 
> will throw an ArrayIndexOutOfBoundsException because firstStripeIndex is 
> still -1.
> I'm not sure if this scenario is possible at all, hence logging this as a low 
> priority issue. Perhaps block based split generation using BISplitStrategy 
> could trigger this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to