[
https://issues.apache.org/jira/browse/DRILL-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001904#comment-16001904
]
Paul Rogers commented on DRILL-5489:
------------------------------------
While we are at it, we might as well fix another bit of bogus code:
{code}
for(Integer i : columnIds){
maxField = 0;
maxField = Math.max(maxField, i);
...
}
{code}
The code to compute maxField simply uses the last value, not the maximum value.
This will be thrown off by a query of the form:
{code}
SELECT columns[20], columns[1] FROM ...
{code}
This will muck up the text reader. The text reader "shortcuts" the rest of the
line if it need not read any more fields. In {{TextReader.parseRecord()}}:
{code}
earlyTerm = ! parseField();
...
if (earlyTerm) {
if (ch != newLine) {
input.skipLines(1);
}
break;
}
{code}
And:
{code}
private final boolean parseField() throws IOException {
...
return output.endField();
{code}
And, in {{RepeatedVarCharOutput.endField()}}:
{code}
public boolean endField() {
...
return fieldIndex < maxField;
}
{code}
The only reason that this does not cause data loss is the insertion of an
unnecessary sort earlier in the constructor:
{code}
Collections.sort(columnIds);
...
for (Integer i : columnIds){
{code}
We can remove the sort and compute the maxField correct and get the same result
with simpler code.
> Unprotected array access in RepeatedVarCharOutput ctor
> ------------------------------------------------------
>
> Key: DRILL-5489
> URL: https://issues.apache.org/jira/browse/DRILL-5489
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Priority: Minor
>
> Suppose a user runs a query of form:
> {code}
> SELECT columns[70000] FROM `dfs`.`mycsv.csv`
> {code}
> Internally, this will create a {{PathSegment}} to represent the selected
> column. This is passed into the {{RepeatedVarCharOutput}} constructor where
> it is used to set a flag in an array of 64K booleans. But, while the code is
> very diligent of making sure that the column name is "columns" and that the
> path segment is an array, it does not check the array value. Instead:
> {code}
> for(Integer i : columnIds){
> ...
> fields[i] = true;
> }
> {code}
> We need to add a bounds check to reject array indexes that are not valid:
> negative or above 64K. It may be that the code further up the hierarchy does
> the checks. But, if so, it should do the other checks as well. Leaving the
> checks incomplete is confusing.
> The result:
> {code}
> Exception (no rows returned):
> org.apache.drill.common.exceptions.UserRemoteException:
> SYSTEM ERROR: ArrayIndexOutOfBoundsException: 70000
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)