[
https://issues.apache.org/jira/browse/NIFI-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603606#comment-17603606
]
John Wise commented on NIFI-8534:
---------------------------------
[~mike.thomsen] - Unfortunately, the documents are on a secure internal network
& can't be transferred out. However, one of our devs looked through the source
and found that this processor is examining the first row & first column for
data. If it doesn't find anything in either, it presumes that the entire sheet
is empty and skips it. Since a lot of these spreadsheets used empty rows &
columns for visual formatting, that's what was causing this processor to skip
all of the actual data contained within them.
Our dev created an updated version, which I used to successfully convert the
spreadsheets. I did have to add some additional parsing to determine where the
actual data started, based primarily on known header rows; ideally, that would
also be part of this processor, but we've moved on to other projects & haven't
gone back to add it ourselves.
I asked our dev for his changes, which he added to the 1.16 version of the
processor:
+*Lines 747-750:*+
{{// if this cell falls outside of our area, or has been explicitly marked as a
skipped column, return and don't write it out}}
{{if(!firstRow && readConfig.getLastColumn() != -1 && (thisCol <
readConfig.getFirstColumn() || thisCol > readConfig.getLastColumn())) {}}
{{ return;}}
{{}}}
> ConvertExcelToCSVProcessor outputting 0-byte files
> --------------------------------------------------
>
> Key: NIFI-8534
> URL: https://issues.apache.org/jira/browse/NIFI-8534
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.12.1
> Reporter: John Wise
> Priority: Critical
>
> We're converting a cache of legacy Excel spreadsheets to CSV in order to
> merge and ingest them in a newer repository. We're experiencing an issue
> where the processor produces a 0-byte file for some worksheets, but not
> others. The processor itself doesn't log any errors or post any bulletins
> about potential data issues.
> Depending on the worksheet, this is happening with 1/3 to 1/2 of the files
> processed, so we're losing quite a bit of that data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)