[jira] [Updated] (NIFI-2613) Support extracting content from Microsoft Excel (.xlxs) documents

Jeremy Dyer (JIRA) Wed, 24 Aug 2016 06:32:12 -0700

     [ 
https://issues.apache.org/jira/browse/NIFI-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jeremy Dyer updated NIFI-2613:
------------------------------
    Description: 
Microsoft Excel is a wildly popular application that businesses rely heavily on 
to store, visualize, and calculate data. Any single company most likely has 
thousands of Excel documents containing data that could be very valuable if 
ingested via NiFi and combined with other datasources. Apache POI is a popular 
100% Java library for parsing several Microsoft document formats including 
Excel. Apache POI is extremely flexible and can do several things. This issue 
would focus solely on using Apache POI to parse an incoming .xlxs document and 
convert it to CSV. The processor should be capable of limiting which excel 
sheets are output as well as which columns. CSV seems like the natural choice 
for outputting each row since this feature is already available in Excel and 
feels very natural to most Excel sheet designs.

This capability should most likely introduce a new "poi" module as I envision 
many more capabilities around parsing Microsoft documents could come from this 
base effort.

  was:
Microsoft Excel is a wildly popular application that businesses rely heavily on 
to store, visualize, and calculate data. Any single company most likely has 
thousands of Excel documents containing data that could be very valuable if 
ingested via NiFi and combined with other datasources. Apache POI is a popular 
100% Java library for parsing several Microsoft document formats including 
Excel. Apache POI is extremely flexible and can do several things. This issue 
would focus solely on using Apache POI to parse an incoming .xlxs document into 
a JSON output. The processor should be capable of limiting which excel sheets 
are output as well as which columns. JSON seems like the natural choice for 
outputting each row since the key could map to the excel column and make 
further downstream processing easier. 

This capability should most likely introduce a new "poi" module as I envision 
many more capabilities around parsing Microsoft documents could come from this 
base effort.


> Support extracting content from Microsoft Excel (.xlxs) documents
> -----------------------------------------------------------------
>
>                 Key: NIFI-2613
>                 URL: https://issues.apache.org/jira/browse/NIFI-2613
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Jeremy Dyer
>            Assignee: Jeremy Dyer
>             Fix For: 1.1.0
>
>
> Microsoft Excel is a wildly popular application that businesses rely heavily 
> on to store, visualize, and calculate data. Any single company most likely 
> has thousands of Excel documents containing data that could be very valuable 
> if ingested via NiFi and combined with other datasources. Apache POI is a 
> popular 100% Java library for parsing several Microsoft document formats 
> including Excel. Apache POI is extremely flexible and can do several things. 
> This issue would focus solely on using Apache POI to parse an incoming .xlxs 
> document and convert it to CSV. The processor should be capable of limiting 
> which excel sheets are output as well as which columns. CSV seems like the 
> natural choice for outputting each row since this feature is already 
> available in Excel and feels very natural to most Excel sheet designs.
> This capability should most likely introduce a new "poi" module as I envision 
> many more capabilities around parsing Microsoft documents could come from 
> this base effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (NIFI-2613) Support extracting content from Microsoft Excel (.xlxs) documents

Reply via email to