Hi Jakob,

I received the .zip.  But can you please send me the source .docx this test.xml 
is generated from?  Something else is going on here so I'd like to test to try 
and recreate your issue.

Also, I'm assuming this is the docx that you've saved to MarkLogic and that you 
are trying to use as a search hit for insert into another document (for testing 
purposes)?

This test.xml document opens in Word in compatibility mode in 2010 (which 
signifies a 2007 doc.)  If you roundtrip a 2010 doc, this shouldn't be the case 
if you're using the latest .xqy; at least in my testing.  Also, simple docs for 
me don't generate footnotes.  The code in word-processing-ml-support.xqy 
accounts for the footnotes.xml part in OPC generation, but, maybe you've 
discovered a bug.

>>. Inside the /word/document.xml part, contents is simplified, for example, an 
>>element for signalling a spelling error has been removed, but otherwise it 
>>looks very much the same

There's the XML Office will consume, and there's the XML it will produce.  We 
aim to keep it simple when working with the formats and provide Office the 
minimum XML for ingest to still get the desired results for the author in the 
active document as well as for the next time the document is saved in Office.

>>. The function ooxml:get-directory-package in the latest 
>>word-processing-ml-support.xqy seems to take the different components in the 
>>order as returned by cts:directory-query which makes me think that order is 
>>not important.

Order does not matter.

>>* I did not have the WordprocessingML Process pipeline activated.
However, once activated the insertion still didn't succeed. (The description of 
this pipeline indicates that it's about merging similar runs. I did notice when 
comparing the XML, that w:r elements were merged, so I'd guess that works).

In case you're interested, the pipeline solves this problem: 
http://community.marklogic.com/blog/smallchanges/2007-12-18

It really shouldn't matter if its activated, but it was a guess as to 
potentially what XML might  be getting tripped up during OPC generation without 
knowing what your docs looked like.

Thanks again,
Pete



-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Jakob Fix
Sent: Friday, February 24, 2012 5:58 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Word add-in and Word 2010

Pete,

thanks for your replies, I very much appreciate them.

I saved the package XML that is about to be inserted in the current Word 
document via the Developer Tools in IE9 and tried to open it in Word (before 
doing that I added the processing instruction so that Windows launches Word 
instead of my XML editor).  This failed and Word tells me why:

---
The file test.xml cannot be opened because there are problems with the contents.

Details

The XML data is invalid according to the schema.

Location: Part: /word/footnotes.xml, Line: 3, Column: 191
---

It then goes on to suggest to attempt to recover its contents and succeeds.

So it quite clearly says the document is not valid according to its schema. I 
compared a basic OPC file created by Word with the one generated by MarkLogic 
(although one cannot expect them to be same as the generated one should only 
contain the part of a document in the database that has a search hit), and the 
main differences seem to be the order of the pkg:parts. Inside the 
/word/document.xml part, contents is simplified, for example, an element for 
signalling a spelling error has been removed, but otherwise it looks very much 
the same. The function ooxml:get-directory-package in the latest 
word-processing-ml-support.xqy seems to take the different components in the 
order as returned by cts:directory-query which makes me think that order is not 
important. But I don't have a schema handy to validate it.


Regarding the checks:
* documents are saved OK via WebDAV. I can open them directly from Word, and as 
you mentioned hits are found. The extraction pipelines are also executed as the 
_parts directory is created.
* I'm using MarkLogic 5.0-2
* I had installed the latest version of the word-processing-ml-support.xqy in 
Modules/MarkLogic/openxml.
* I did not have the WordprocessingML Process pipeline activated.
However, once activated the insertion still didn't succeed. (The description of 
this pipeline indicates that it's about merging similar runs. I did notice when 
comparing the XML, that w:r elements were merged, so I'd guess that works).

So, in summary, the package XML retrieved from MarkLogic contains the different 
parts in a different order than how Word creates them.
Otherwise I cannot see the differences. For information, I added the XML as a 
zip file to this mail. If it doesn't make it through to the list, I'll send it 
to you off-list.

cheers,
Jakob.



On Thu, Feb 23, 2012 at 19:24, Pete Aven <[email protected]> wrote:
> Jakob!
>
>>>These documents are DOCX and were created by me when playing around with the 
>>>tool kit and saved directly to MarkLogic via WebDAV. Now, given that the 
>>>error message is the same as above and it was inappropriate there, I wonder 
>>>what the reason might be here.
>
> The error message usually indicates that Word doesn't like the XML you are 
> trying to insert.  So, something maybe wrong with the XML created for insert.
>
> Things to check:
>
> 1) You can validate the documents are indeed saved to ML after using
> WebDAV
>        WebDAV often does not work properly, especially on windows. I'm 
> assuming it works as you get search results, but, just in case.
>
> 2)  Office Open XML Extract and WordprocessingML Process pipelines are
> enabled
>        I know the former is, but the latter?
>        For WordprocessingML Process, which version of the server are you 
> using?  5.0 supports the 2010 format, but previous versions do not.  Let me 
> know if you are using an earlier version and I can forward the appropriate 
> files (there are 2 .xqy, they're small).
>
> 3) Did you copy over the latest version of word-processing-ml-support.xqy 
> that I sent you in the .zip to <server-root>/ Modules/MarkLogic/openxml ?
>        This latest copy has support for the 2010 flavor of WordprocessingML, 
> where the one downloadable from Community does not.
>
>>> Interestingly enough, I'm not getting any results for words appearing in 
>>> the boilerplate documents, are they excluded from the search?
>
> When you enrich a document in Word, it adds what are called 'Content 
> Controls' around the selected sections within the Word application.  In the 
> XML, these manifest themselves as Structured Document Tags; w:sdt elements.
>
> Searches are performed against any text found within child elements of w:sdt.
>
> When you insert, the search hit (the w:sdt from the source document)  is 
> formatted using the XQuery API into a Word document in the OPC format. (at 
> least, it should be when everything is working correctly.)  This OPC document 
> is then inserted into the active document through MLA.insertWordOpenXML().
>
> The boilerplates probably don't have any content within w:sdt tags and 
> therefore are not showing up in searches.
>
> You can of course change the search by modifying
> Author/search/search.xqy, but let's not go there til we sort out
> insert. :)
>
> Hope this helps,
> Pete
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Jakob
> Fix
> Sent: Thursday, February 23, 2012 12:15 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Word add-in and Word 2010
>
> Hi Pete,
>
> OK, I grok the boilerplate functionality now (I somehow expected the files to 
> already exist behind the buttons). The error message about "XML markup [that] 
> cannot be inserted in the specified location" was kind of misleading. But 
> that's cool.  I've created a couple of documents with different styles and 
> they are maintained on insert, which is what you would expect when you know 
> it's actually the OPC XML that's being copied and pasted, but still nice.
>
> We're making progress, thanks a lot. :)
>
> Next up is search: My search finds hits in docx documents right now, and the 
> debug alert about the contents of the XML about to be inserted shows OPC XML, 
> here's a bit:
>
> PACKAGE XML IS <pkg:package
> xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage";><pkg:p
> art pkg:name="/word/glossary/fontTable.xml"
> pkg:contentType="application/vnd.openxmlformats-officedocument.wordpro
> cessingml.fontTable+xml"><pkg:xmlData><w:fonts
> mc:Ignorable="w14"
> xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships";
> xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"; .
>
> These documents are DOCX and were created by me when playing around with the 
> tool kit and saved directly to MarkLogic via WebDAV. Now, given that the 
> error message is the same as above and it was inappropriate there, I wonder 
> what the reason might be here. Clearly, this is not about the cursor being at 
> the wrong position. By the way, the "Open" button for each search result 
> works fine and opens the document as expected.  One of the search results is 
> a "Section" the other one a "Policy".
>
> Interestingly enough, I'm not getting any results for words appearing in the 
> boilerplate documents, are they excluded from the search?
>
> cheers,
> Jakob.
>
>
>
> On Thu, Feb 23, 2012 at 14:54, Pete Aven <[email protected]> wrote:
>> Hi Jakob,
>>
>> Are you trying to insert from the boilerplate tab, from a search hit, or 
>> both?
>>
>> To test boilerplate: save a document as XML from Word. (just as XML, not 
>> 2003 XML), save this to the database, and reference it in the config file 
>> found at Author/config/boilerplate.xml.
>>
>> Documents saved as XML are saved in what Microsoft calls OPC format. See 
>> http://community.marklogic.com/blog/smallchanges/2009-01-08 for more details.
>>
>> Then restart Word, place your cursor somewhere in the document, goto the 
>> boilerplate tab in the application, and click the button for the boilerplate 
>> you just added.
>>
>> You'll see that the code for boilerplate insert fetches the document from 
>> the Server and passes it to insertWordOpenXML() which inserts it at the 
>> current cursor location.  If this works, we're on the right track.
>>
>> The insert function from the button on a search hit, takes a component found 
>> in a search ( a component being anything previously enriched from the enrich 
>> tab in the Authoring application and saved to MarkLogic  ), and uses the 
>> XQuery API to format it as OPC, before inserting into the doc using the 
>> insertWordOpenXML() function.
>>
>> Are you starting with existing docs?  Or docs from SharePoint?  These
>> may have XML elements we haven't seen yet that aren't accounted for
>> in the XQuery API and so may cause an issue. You may want to start by
>> Authoring new docs to test the functionality, then hammer it with
>> your existing docs to break it. :)
>>
>> Hope this helps,
>> Pete
>>
>>
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Jakob
>> Fix
>> Sent: Thursday, February 23, 2012 8:39 AM
>> To: MarkLogic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] Word add-in and Word 2010
>>
>> After having installed the Author sample application which is really
>> rather cool, the insertion into the current document still doesn't
>> work.  While trying to understand, I noticed that the insert
>> functionality expects the URI lexicon to be enabled which wasn't
>> mentioned in the documentation but enabling that got me one step
>> further.  Now it seems that the items found cannot not be inserted
>> anywhere in the current document. By the way, I stuck with the
>> defaults right now (i.e. Policies, Sections, Recommendations). That's
>> the error message: ERROR error: XML markup cannot be inserted in the
>> specified location. which I was able to track down to this function
>> in
>> MarkLogicWordAddin:
>>
>> line 1368 MLA.insertWordOpenXML = function(opc_xml)
>>
>> and more particularly that line:
>>
>> line 1381 window.external.insertWordOpenXML(v_docx);
>>
>> Glad for any ideas
>>
>> cheers,
>> Jakob.
>>
>>
>>
>> On Wed, Feb 22, 2012 at 17:59, Jakob Fix <[email protected]> wrote:
>>> Thanks Pete,
>>>
>>> That's extra quick! :)
>>> I got the zip. and am updating the msi as we speak.
>>>
>>> cheers,
>>> Jakob.
>>>
>>>
>>>
>>> On Wed, Feb 22, 2012 at 17:45, Pete Aven <[email protected]> wrote:
>>>> Hi Jakob,
>>>>
>>>>>>1) this add-in is supported for Word 2010
>>>>
>>>> Though the Addin will install with Office 2010; the XQuery API with the 
>>>> Toolkit that is currently available on the Community site is only 
>>>> compatible with the 2007 flavor of WordprocessingML.
>>>>
>>>> The TK has been updated for 2010 support and is currently sitting in a 
>>>> repository where I'm told it will be released onto the unsuspecting, 
>>>> Office 2010-hungry masses at some point in the future.  Until then, I've 
>>>> sent you a snapshot of the latest TK to your gmail.
>>>>
>>>>>>2) if so how can one debug this Javascript code (is there a Firebug-like 
>>>>>>tool for this?).
>>>>
>>>> Unfortunately, not really.  Develop for the Addin application everything 
>>>> you can outside of the context of the Addin (In IE).  You can use IE8 
>>>> which has developer tools which are similar to firebug.  Once the 
>>>> application is in the Addin however and calling the MLA functions, your 
>>>> only real option is to use alert()s (or write logs to the filesystem, 
>>>> which you can do with JavaScript in IE).
>>>>
>>>>>> MLA.insertBlockContent(response.responseXML);
>>>>
>>>> This function really should be deprecated.  Instead of the simple
>>>> Sample, I'd suggest using the Sample Authoring App to enrich/insert
>>>> content, and taking a look at the function MLA.insertWordOpenXML().
>>>> Once you grok this function, you will keep Word in a headlock and
>>>> pretty much have your way with it. :)
>>>>
>>>> Hope this helps,
>>>> Pete
>>>>
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Jakob
>>>> Fix
>>>> Sent: Wednesday, February 22, 2012 11:24 AM
>>>> To: General Mark Logic Developer Discussion
>>>> Subject: [MarkLogic Dev General] Word add-in and Word 2010
>>>>
>>>> Hello again,
>>>>
>>>> I've installed successfully the Word add-in and am able to search using 
>>>> the sample provided in the download.
>>>>
>>>> However, the double-click on a found paragraph does not insert it
>>>> into the currently open word document. Probably, things have
>>>> changed from
>>>> 2007 to 2010.  Looking at the Javascript code in Samples/search/search.js 
>>>> I find this line:
>>>>
>>>> MLA.insertBlockContent(response.responseXML);
>>>>
>>>> which seems to be responsible for the insertion of the paragraph.
>>>>
>>>> So, I guess my question is whether 1) this add-in is supported for Word 
>>>> 2010 and 2) if so how can one debug this Javascript code (is there a 
>>>> Firebug-like tool for this?).
>>>>
>>>> cheers,
>>>> Jakob.
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to