[jira] [Commented] (NIFI-2613) Support extracting content from Microsoft Excel (.xlxs) documents

ASF GitHub Bot (JIRA) Tue, 14 Feb 2017 11:59:00 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866519#comment-15866519
 ]


ASF GitHub Bot commented on NIFI-2613:
--------------------------------------

Github user jdye64 commented on the issue:

    https://github.com/apache/nifi/pull/929
  
    @jvwing wow sorry its been a long time for my response. I've tried to go 
through and make all of the cleanup suggestions you have made.
    
    Yes, the intention is a single output per sheet in the excel file. Keep in 
mind that if the sheet is not .xlsx and rather .xls format you would see the 
behavior you are experiencing. Can you attach the excel doc you were testing 
with?
    
    I made the change to case of the attribute
    
    Also adjusted the error handling to prevent the end user from being 
pummeled with a full stack trace.


> Support extracting content from Microsoft Excel (.xlxs) documents
> -----------------------------------------------------------------
>
>                 Key: NIFI-2613
>                 URL: https://issues.apache.org/jira/browse/NIFI-2613
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Jeremy Dyer
>            Assignee: Jeremy Dyer
>
> Microsoft Excel is a wildly popular application that businesses rely heavily 
> on to store, visualize, and calculate data. Any single company most likely 
> has thousands of Excel documents containing data that could be very valuable 
> if ingested via NiFi and combined with other datasources. Apache POI is a 
> popular 100% Java library for parsing several Microsoft document formats 
> including Excel. Apache POI is extremely flexible and can do several things. 
> This issue would focus solely on using Apache POI to parse an incoming .xlxs 
> document and convert it to CSV. The processor should be capable of limiting 
> which excel sheets. CSV seems like the natural choice for outputting each row 
> since this feature is already available in Excel and feels very natural to 
> most Excel sheet designs.
> This capability should most likely introduce a new "poi" module as I envision 
> many more capabilities around parsing Microsoft documents could come from 
> this base effort.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (NIFI-2613) Support extracting content from Microsoft Excel (.xlxs) documents

Reply via email to