[ 
https://issues.apache.org/jira/browse/DRILL-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001908#comment-16001908
 ] 

Paul Rogers commented on DRILL-5489:
------------------------------------

Similarly, the {{RepeatedVarCharOutput}} class does not protect itself from a 
file with more than 64K fields on input:

{code}
  @Override
  public void startField(int index) {
    fieldIndex = index;
    collect = collectedFields[index];
    fieldOpen = true;
  }
{code}

Here, the parser counts the fields and calls {{startField}} for each. If the 
field is 65537, then the above method will check the {{collectedFields}} to 
determine if the field is wanted. But, that array is hard-coded at 65536 
entries, so the code will fail with an array out-of-bounds exception.

Instead, the code should simply discard extra fields, or throw a 
{{UserException}} to report too many fields.

> Unprotected array access in RepeatedVarCharOutput ctor
> ------------------------------------------------------
>
>                 Key: DRILL-5489
>                 URL: https://issues.apache.org/jira/browse/DRILL-5489
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Suppose a user runs a query of form:
> {code}
> SELECT columns[70000] FROM `dfs`.`mycsv.csv`
> {code}
> Internally, this will create a {{PathSegment}} to represent the selected 
> column. This is passed into the {{RepeatedVarCharOutput}} constructor where 
> it is used to set a flag in an array of 64K booleans. But, while the code is 
> very diligent of making sure that the column name is "columns" and that the 
> path segment is an array, it does not check the array value. Instead:
> {code}
>         for(Integer i : columnIds){
>           ...
>           fields[i] = true;
>         }
> {code}
> We need to add a bounds check to reject array indexes that are not valid: 
> negative or above 64K. It may be that the code further up the hierarchy does 
> the checks. But, if so, it should do the other checks as well. Leaving the 
> checks incomplete is confusing.
> The result:
> {code}
> Exception (no rows returned): 
> org.apache.drill.common.exceptions.UserRemoteException: 
> SYSTEM ERROR: ArrayIndexOutOfBoundsException: 70000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to