[ 
https://issues.apache.org/jira/browse/DRILL-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456007#comment-17456007
 ] 

ASF GitHub Bot commented on DRILL-8070:
---------------------------------------

pjfanning commented on pull request #2399:
URL: https://github.com/apache/drill/pull/2399#issuecomment-989221637


   @cgivre the test involves a new xlsx that has a copy of the 'data' sheet 
from test_data.xlsx but with 3 empty rows before the header row. Without my 
code change, the existing format-excel code skips too far because its skip 
logic is wrong (not handling rows that are simply missing from the iterator)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> format-excel assumes that rowIterator returns every row
> -------------------------------------------------------
>
>                 Key: DRILL-8070
>                 URL: https://issues.apache.org/jira/browse/DRILL-8070
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>            Reporter: PJ Fanning
>            Priority: Major
>
> In ExcelBatchReader, this code makes the wrong assumption:
> {code:java}
>     for (int i = 1; i < rowNumber; i++) {
>          currentRow = rowIterator.next();
>     } {code}
>  
> There are 2 for loops like this.
> Empty Rows will not necessarily be returned by the iterator. Basically, rows 
> without populated cells could easily be skipped. Think of the Sheet as being 
> represented as a sparse matrix - because it is stored like this.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to