Jeremy Dyer created NIFI-2613:
---------------------------------
Summary: Support extracting content from Microsoft Excel (.xlxs)
documents
Key: NIFI-2613
URL: https://issues.apache.org/jira/browse/NIFI-2613
Project: Apache NiFi
Issue Type: New Feature
Components: Extensions
Reporter: Jeremy Dyer
Assignee: Jeremy Dyer
Fix For: 1.1.0
Microsoft Excel is a wildly popular application that businesses rely heavily on
to store, visualize, and calculate data. Any single company most likely has
thousands of Excel documents containing data that could be very valuable if
ingested via NiFi and combined with other datasources. Apache POI is a popular
100% Java library for parsing several Microsoft document formats including
Excel. Apache POI is extremely flexible and can do several things. This issue
would focus solely on using Apache POI to parse an incoming .xlxs document into
a JSON output. The processor should be capable of limiting which excel sheets
are output as well as which columns. JSON seems like the natural choice for
outputting each row since the key could map to the excel column and make
further downstream processing easier.
This capability should most likely introduce a new "poi" module as I envision
many more capabilities around parsing Microsoft documents could come from this
base effort.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)