In case it's useful, Microsoft also provides the Microsoft Office Planning 
Manager and Office Compatibility Packs free, which will allow you to bulk 
convert older Office formats ( 2003 and earlier ) to the new OOXML formats.

https://www.microsoft.com/en-us/download/details.aspx?id=21888#filelist
https://www.microsoft.com/en-us/download/details.aspx?id=3

Pete

From: [email protected] 
[mailto:[email protected]] On Behalf Of Gary Russo
Sent: Monday, October 20, 2014 10:44 AM
To: 'MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
metadata from an Excel 97/2003?

Hello Ron,

Yes, it is feasible to do the metadata extraction upstream of MarkLogic.

It complicates things a little bit but it will be ok.

Apache Tika looks like a nice solution.

My client is a Microsoft shop and they use a product called Aspose to 
convert/extract data from spreadsheets.

The majority of spreadsheet formats that I need to ingest use the older 97/2003 
format. I can use the Aspose API to covert the older format to OOXML on the fly.

It's unfortunate that the MarkLogic xdmp:document-filter() API is not able to 
extract the "defined name" metadata from the "97/2003" file format.

I consider it to be a bug in the MarkLogic API because other Excel Spreadsheet 
extraction APIs (e.g., Aspose, Tika, Apache POI) can extract this data from the 
older file format.

Anyway, thanks for the info.


-          Gary R



From: [email protected] 
[mailto:[email protected]] On Behalf Of Ron Hitchens
Sent: Friday, October 17, 2014 11:52 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
metadata from an Excel 97/2003?


   If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., 
before insertion) you might take a look at Apache Tika.  It's designed for this 
sort of thing.

   You could also setup it up in a simple web service callable from MarkLogic.  
POST the spreadsheet to it and have it return the metadata in whatever form you 
like.

---
Ron Hitchens {[email protected]<mailto:[email protected]>}  +44 7879 
358212

On Oct 17, 2014, at 3:35 PM, Gary Russo 
<[email protected]<mailto:[email protected]>> wrote:

Hello Dennis,

Thanks for the info.

Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata 
either.

The metadata that I need to retrieve from the older excel format is the "Named 
Fields".

Users create them using the Excel "Named Box" feature as shown here. => 
http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm

It looks like my only option is to use the Apache POI Java API to extract the 
named fields or use it to convert xls-to-xlsx on-the-fly. 
=>https://poi.apache.org/apidocs

I know there's a hidden way to use MarkLogic's underlying JVM.

It would be great if I could use it to call the Apache POI code.

But that's a question for another day.

Thanks again,

Gary Russo


Gary Russo
Enterprise NoSQL Developer
http://garyrusso.wordpress.com
http://twitter.com/garyprusso



From: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of David Ennis
Sent: Thursday, October 16, 2014 5:02 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
metadata from an Excel 97/2003?

HI.

I believe that with the conversion licence, you can do what you want with: 
xdmp:excel-convert

Barring that, you could always run openoffice as a headless server for 
conversion purposes.

Kind Regards,
David Ennis





Kind Regards,
David Ennis


David Ennis
Content Engineer

[HintTech] <http://www.hinttech.com/>
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80

[http://www.hinttech.com]<http://www.hinttech.com> 
[http://www.hinttech.com/signature/Twitter_HintTech.png] 
<https://twitter.com/HintTech>  
[http://www.hinttech.com/signature/Facebook_HintTech.png] 
<http://www.facebook.com/HintTech>  
[http://www.hinttech.com/signature/Linkedin_HintTech.png] 
<http://www.linkedin.com/company/HintTech>

On 16 October 2014 20:00, Gary Russo 
<[email protected]<mailto:[email protected]>> wrote:
I need to extract worksheet metadata called "defined name" from Excel 97/2003 
formatted spreadsheets.

The ISYS xdmp:document-filter() API is limiting because it only extracts the 
text.

It does not extract any worksheet metadata.

Does anyone know of a workaround for this?

My only thought is to upload the "Excel 97/2003" xls file and then convert it 
on the server to an "Excel 2010" xlsx format.

Once it's in an Excel 2010 format, I can easily extract the "defined name" 
metadata.

This is what it looks like in "Excel 2010" files.

  <definedNames>
    <definedName name="LastYr">Revenue!$B$6:$B$15</definedName>
    <definedName name="ThisYr">Revenue!$C$6:$C$15</definedName>
    <definedName name="Variance">Revenue!$D$6:$D$15</definedName>
  </definedNames>


Thanks,
Gary Russo


Gary Russo
Enterprise NoSQL Developer
Phone: 212-404-8639<tel:212-404-8639>
Skype: garyprusso
http://garyrusso.wordpress.com


_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to