[ 
https://issues.apache.org/jira/browse/NIFI-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603606#comment-17603606
 ] 

John Wise commented on NIFI-8534:
---------------------------------

[~mike.thomsen] - Unfortunately, the documents are on a secure internal network 
& can't be transferred out.  However, one of our devs looked through the source 
and found that this processor is examining the first row & first column for 
data.  If it doesn't find anything in either, it presumes that the entire sheet 
is empty and skips it.  Since a lot of these spreadsheets used empty rows & 
columns for visual formatting, that's what was causing this processor to skip 
all of the actual data contained within them.

Our dev created an updated version, which I used to successfully convert the 
spreadsheets.  I did have to add some additional parsing to determine where the 
actual data started, based primarily on known header rows; ideally, that would 
also be part of this processor, but we've moved on to other projects & haven't 
gone back to add it ourselves.

I asked our dev for his changes, which he added to the 1.16 version of the 
processor:

+*Lines 747-750:*+
{{// if this cell falls outside of our area, or has been explicitly marked as a 
skipped column, return and don't write it out}}
{{if(!firstRow && readConfig.getLastColumn() != -1 && (thisCol < 
readConfig.getFirstColumn() || thisCol > readConfig.getLastColumn())) {}}
{{    return;}}
{{}}}

> ConvertExcelToCSVProcessor outputting 0-byte files
> --------------------------------------------------
>
>                 Key: NIFI-8534
>                 URL: https://issues.apache.org/jira/browse/NIFI-8534
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.12.1
>            Reporter: John Wise
>            Priority: Critical
>
> We're converting a cache of legacy Excel spreadsheets to CSV in order to 
> merge and ingest them in a newer repository.  We're experiencing an issue 
> where the processor produces a 0-byte file for some worksheets, but not 
> others.  The processor itself doesn't log any errors or post any bulletins 
> about potential data issues.
> Depending on the worksheet, this is happening with 1/3 to 1/2 of the files 
> processed, so we're losing quite a bit of that data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to