Hi Joydeep,

 

There is currently no xhtml conversion for OOXML packages.   You'll have
to write your own transformation.

 

The 'Office OpenXML Extract' pipeline will extract the pieces from a
.docx package into a parts folder.

 

Example:  if you save the document "HelloWorld.docx" to a domain with
the 'Office OpenXML Extract' pipeline attached, you'll find the binary,
along with all the parts in the "parts" folder.

HelloWorld.docx

HelloWorld_docx_parts/word/document.xml

HelloWorld_docx_parts/[Content_Typex].xml

HelloWorld_docx_parts/_rels/.rels

HelloWorld_docx_parts/etc.     ...   (rest of parts)

 

You can query and display the pieces in CQ. 

 

Styling information for the .docx can be found in the <w:rPr>,<w:pPr>
elements in document.xml.  The child elements of these will contain
either direct formatting elements, or references to the styles.xml file
from the package, which contains all style definitions for the document.

 

Hope this helps,

Pete

 

From: Joydeep_Sinha [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 14, 2008 1:31 PM
To: Pete Aven
Cc: Vivek_Nagasundara; [email protected]
Subject: RE: [MarkLogic Dev General] Important: Resolution of an Issue
in Marklogic

 

Hi Pete,

 

Just few more queries, till now the way we used to handle other file
formats in Mark Logic was

*         Select any particular format of file.

*         Upload it into Mark Logic using xdmp-load query.

*         During this process it used to convert the file into xhtml and
xml components along with generation of "parts" folder for storing
associated images of the content.

*         Display it in CQ editor using doc query.

 

Can you please help in co-relating the above process with docx format? I
mean is there any single file like xhtml which can have the styling
information of the ingested docx file or do we need to construct that?

 

Thanks,

Joydeep Sinha

Onsite Co-ordinator  - IDMF PoC

Media and Entertainment - Solution Offerings

Satyam Computer Services Limited.

Mobile - (001)-6103020388

 

 

 

From: Joydeep_Sinha 
Sent: Thursday, November 13, 2008 1:58 PM
To: 'Pete Aven'
Cc: Vivek_Nagasundara
Subject: RE: [MarkLogic Dev General] Important: Resolution of an Issue
inMarklogic

 

Hi Pete,

 

Thanks for your comments. We will surely try this out and revert back in
case we face further implementation challanges.

 

Thanks,

Joydeep Sinha

Media and Entertainment - Solution Offerings

Satyam Computer Services Limited.

Mobile - (001)-6103020388

 

 

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pete Aven
Sent: Wednesday, November 12, 2008 2:36 PM
To: General Mark Logic Developer Discussion
Cc: Vivek_Nagasundara; Thangavelu_Senniyappan
Subject: RE: [MarkLogic Dev General] Important: Resolution of an Issue
inMarklogic

 

Hi Joydeep,

 

In MarkLogic Server 4.0, check out the 'Office OpenXML Extract' and
'WordprocessingML Process' pipelines in Content Processing.  

 

These are not enabled by default when you install Content Processing, so
you will have to attach them to your domain.  These 2 pipelines, along
with 'Status Change Handling', will process Word 2007 documents saved to
the Server.

 

Office Open XML Extract:  extracts the parts from a .docx package into a
directory named for the originating file.

WordprocessingML process:   updates document.xml (extracted from every
.docx package), by merging text split across runs (<w:r> elements) to
help improve search results and clean up the content for repurposing.

 

It's  also easy to assemble Word documents on the server by using the
xdmp:zip* utilities.

 

Hope this helps,

Pete

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Joydeep_Sinha
Sent: Wednesday, November 12, 2008 10:44 AM
To: [email protected]
Cc: Vivek_Nagasundara; Thangavelu_Senniyappan
Subject: [MarkLogic Dev General] Important: Resolution of an Issue
inMarklogic
Importance: High

 

HI All,

 

I am from Satyam Computer Services Limited and we generally build
solutions on top of Marklogic. Currently we are using Marklogic to
upload docx files (MS Office 2007 formats) but are unaware of the
conversion capabilities of Marklogic to xhtml/xml components. Please
confirm how we can allow ingestion of docx formats into Marklogic and
how the latest version of Marklogic would support handling of the latest
Office formats.

 

It would be great, if you all can provide us the exact Xquery for
handling such issue or inform the change which would be required so as
to allow Office 2007 formats ingestion and retrieval to and from
Marklogic.

 

A quick resolution, would be greatly appreciated.

 

Thanks and Regards,

Joydeep Sinha

Onsite Co-ordinator  - IDMF PoC

Media and Entertainment - Solution Offerings

Satyam Computer Services Limited.

Mobile - (001)-6103020388

 

 

________________________________

DISCLAIMER:
This email (including any attachments) is intended for the sole use of
the intended recipient/s and may contain material that is CONFIDENTIAL
AND PRIVATE COMPANY INFORMATION. Any review or reliance by others or
copying or distribution or forwarding of any or all of the contents in
this message is STRICTLY PROHIBITED. If you are not the intended
recipient, please contact the sender by email and delete all copies;
your cooperation in this regard is appreciated.

 

________________________________

DISCLAIMER:
This email (including any attachments) is intended for the sole use of
the intended recipient/s and may contain material that is CONFIDENTIAL
AND PRIVATE COMPANY INFORMATION. Any review or reliance by others or
copying or distribution or forwarding of any or all of the contents in
this message is STRICTLY PROHIBITED. If you are not the intended
recipient, please contact the sender by email and delete all copies;
your cooperation in this regard is appreciated.

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to