That should work.  I just tried on 8.0-1.1 on Windows and got the expected 
results.

If you're using CPF.  Then you want to confirm you have the following pipelines 
enabled:

Status Change Handling
Office OpenXML Extract

For Office 2007 and greater (docs ending with a .docx, .pptx. .xlsx extension) 
the file format is XML, and so you can unzip the contents and work with the 
native OpenXML Format directly once you've extracted the contents using  the 
Office OpenXML Extract pipeline.

Once inserted, the original doc will be saved in MarkLogic as:
/myDoc/UtilizationReport_xlsx              //the original doc

Once this original doc processed by Office OpenXML Extract, you should see the 
extracted parts in MarkLogic as well :
/myDoc/UtilizationReport_xlsx_parts   //with a bunch of .xml here in 
SpreadsheetML format

The cpf state on the .xlsx will be:  http://marklogic.com/states/extracted

If you already have those 2 pipelines enabled, you may want to disable others 
to see if you can get the expected results to insure no pipelines are 
conflicting with each other in their attempt to process the document.

Hope this helps,
Pete



From: [email protected] 
[mailto:[email protected]] On Behalf Of Javier Lizarraga
Sent: Thursday, March 26, 2015 7:51 PM
To: [email protected]
Subject: [MarkLogic Dev General] Converting MS Office documents

Hello Developers,

I want to load an MS excel file with filename.xlsx into a MarkLogic database 
(using ML8).  I want to be able to access the contents of the MS excel document.
I enabled the triggers for the database and installed  and enabled the Content 
Processing.  I followed the ML document below:
http://docs.marklogic.com/guide/cpf/default#<http://docs.marklogic.com/guide/cpf/default>

Loaded:
declareUpdate();
xdmp.documentLoad("C:\\Users\\jlizarraga\\Documents\\UtilizationReport.xlsx",
    {
      "uri" : "/myDoc/UtilizationReport.xlsx",
      "permissions" : xdmp.defaultPermissions()
    })

When I load my UtilizationReport.xlsx file I can see the associated properties 
in Query Console:
<?xml version="1.0" encoding="UTF-8"?>
<prop:properties xmlns:prop="http://marklogic.com/xdmp/property";>
  <cpf:processing-status 
xmlns:cpf="http://marklogic.com/cpf";>done</cpf:processing-status>
  <cpf:property-hash 
xmlns:cpf="http://marklogic.com/cpf";>d41d8cd98f00b204e9800998ecf8427e</cpf:property-hash>
  <cpf:last-updated 
xmlns:cpf="http://marklogic.com/cpf";>2015-03-26T16:24:16-07:00</cpf:last-updated>
  <cpf:state 
xmlns:cpf="http://marklogic.com/cpf";>http://marklogic.com/states/converted</cpf:state<http://marklogic.com/states/converted%3c/cpf:state>>
  <cpf:self 
xmlns:cpf="http://marklogic.com/cpf";>/myDoc/UtilizationReport.xlsx</cpf:self>
</prop:properties>

It appears to me that it was successful but I do not see any other associated 
documents besides the UtilizationReport.xlsx file reference.

I was expecting to see:
UtilizationReport.xlsx  (Original Document)
UtilizationReport_xlsx.xml
UtilizationReport_xlsx.xhtml
A Directory called UtilizationReport_xlsx_Parts

I don't see any errors.  Any help would be greatly appreciated.

Thanks,

Javier
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to