[ 
https://issues.apache.org/jira/browse/NIFI-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707330#comment-17707330
 ] 

Daniel Stieglitz commented on NIFI-11167:
-----------------------------------------

[~exceptionfactory] It looks like I will need read styles to be true after all 
as per the 
[StreamingReader.Builder|https://pjfanning.github.io/excel-streaming-reader/javadocs/3.5.0/index.html?com/github/pjfanning/xlsx/StreamingReader.Builder.html]
 javadocs for setReadStyles the formats are used to format dates (in our case 
it can help determine whether a cell contains a date value). 

{quote}
The style data is very useful for formatting numbers in particular because the 
raw numbers in the Excel file are in double precision format and may not match 
exactly what you see in the Excel cell.

With date and timestamp data, the raw data is also numeric and without the 
style data, the reader will treat the data as numeric. If you already know if 
certain cells hold date or timestamp data, the the getLocalDateTimeCellValue 
and getDateCellValue methods will work even if you have disabled the reading of 
style data.
{quote}

> Add Excel Record Reader
> -----------------------
>
>                 Key: NIFI-11167
>                 URL: https://issues.apache.org/jira/browse/NIFI-11167
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: David Handermann
>            Assignee: Daniel Stieglitz
>            Priority: Minor
>
> A new Excel Record Reader should be implemented to support reading XSLX 
> spreadsheet rows as NiFi Records. This Reader will enable integration with 
> various record-oriented components, obviating the need for the narrowly 
> focused ConvertExcelToCSVProcessor. The initial version of the Excel Reader 
> should not support the legacy binary XLS format.
> The ExcelReader should use a library that supports reading from a stream of 
> rows to avoid consuming large amounts of heap memory during processing.
> The ExcelReader should support configurable properties to read selected 
> sheets. With Excel supporting typed field values, some amount of field type 
> mapping will be required. Additional input filtering properties should not be 
> implemented as existing Processors like QueryRecord support a wide variety of 
> filtering and projection use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to