Dmitry Kulakov created TIKA-2927:
------------------------------------

             Summary: XSSFExcelExtractorDecorator emits non-existent empty rows.
                 Key: TIKA-2927
                 URL: https://issues.apache.org/jira/browse/TIKA-2927
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.22, 1.21, 1.20
            Reporter: Dmitry Kulakov


Parsing xlsx files with the _includeMissingRows_ set to true in the 
_OfficeParserConfig_ causes the _XSSFExcelExtractorDecorator_ to emit extra 
empty rows equal to the current row number - 1. The issue is that the 
_lastSeenRow_ is never updated, so every new row is treated as the first 
non-empty row. Easy fix which requires the _lastSeenRow_ to be updated after 
the start of every new row. I will add the fix along with the relevant unit 
test in a pull request.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to