Dmitry Kulakov created TIKA-2927:
------------------------------------
Summary: XSSFExcelExtractorDecorator emits non-existent empty rows.
Key: TIKA-2927
URL: https://issues.apache.org/jira/browse/TIKA-2927
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.22, 1.21, 1.20
Reporter: Dmitry Kulakov
Parsing xlsx files with the _includeMissingRows_ set to true in the
_OfficeParserConfig_ causes the _XSSFExcelExtractorDecorator_ to emit extra
empty rows equal to the current row number - 1. The issue is that the
_lastSeenRow_ is never updated, so every new row is treated as the first
non-empty row. Easy fix which requires the _lastSeenRow_ to be updated after
the start of every new row. I will add the fix along with the relevant unit
test in a pull request.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)