Jose Luis de la Rosa Triviño a écrit :
> 
> I am testing CPSOOo and when I upload an odt or sxw file the content is
> not indexed.
>

On my side, with a CPS from trunk both ODT and SXW files got indexed
when I create an OpenOffice.org DocBook Document :

2007-06-26T14:58:51 INFO PortalTransforms PATH FROM
application/vnd.oasis.opendocument.text TO application/docbook+xml :
[<Transform at ooo_to_docbook>]
------
2007-06-26T14:58:51 DEBUG ooo_to_docbook cmd = cd "/tmp/tmpULekmy" &&
/usr/local/zope/instance/cps/Products/PortalTransforms/transforms/ooo2dbk/ooo2dbk
--dbkfile unknown.docb.xml /tmp/tmpULekmy/unknown.sxw
2>"unknown.log-xsltproc"
------
2007-06-26T14:58:57 DEBUG CPSSchemas.FileUtils._convertFileToMimeType to
text/html for file <File at document.docb>
------
2007-06-26T14:58:57 INFO PortalTransforms PATH FROM
application/docbook+xml TO text/html : [<Transform at docbook_to_html>]
------
2007-06-26T14:58:57 DEBUG docbook_to_html cmd = cd "/tmp/tmpdBmkDr" &&
/usr/bin/xsltproc --novalid
/usr/local/zope/instance/cps/Products/PortalTransforms/transforms/docbook/custom-xhtml.xsl
/tmp/tmpdBmkDr/unknown.docb.xml >"unknown.docb.html"
2>"unknown.docb.log-xsltproc"
------
2007-06-26T14:59:01 DEBUG CPSSchemas.FileUtils._convertFileToMimeType to
text/plain for file <File at document.html>
------
2007-06-26T14:59:01 INFO PortalTransforms PATH FROM text/html TO
text/plain : [<Transform at lynx_dump>]
------

When you arrived to the lynx_dump transform it means that the HTML
is transformed to text, and this is this plain text which is used
for the indexing.



What have you got in
http://localhost:8080/cps/portal_transforms/ooo_to_docbook/manage_main
http://localhost:8080/cps/portal_transforms/ooo_to_docbook/manage_introspection
?


> In debug mode, the information thrown to the console is:
> 
> ---------------------------------------------
> 2007-06-25 12:54:18 INFO PortalTransforms PATH FROM
> application/vnd.oasis.opendocument.text TO application/docbook+xml :
> [<Transform at ooo_to_docbook>]
> There isn't any value for this parameter. There should be an error in
> your config.xml.
> ------------------------------------------------------
> 

This error message comes from the ooo2dbk program. It says that the
config.xml ooo2dbk configuration file has an empty configuration
parameter while it should not be empty. Actually there's an error
in the error message because the configuration file is ooo2dbk.xml.


> The name searched is "ooo2" and the content of the global variable
> configElts is:
> 
> -----------------------------------------------------------------------
> [(u'xslt-command', {u'param-syntax': u'--stringparam %s %s', u'command':
> u'xsltproc %v -o %o %s %i', u'name': u'xsltproc'}),
> (u'xslt-command', {u'param-syntax': u'%s=%s', u'command': u'java
> com.icl.saxon.StyleSheet -o %o %i %s %v', u'name': u'saxon'}),
> (u'xslt-command', {u'param-syntax': u'%s=%s', u'command': u'java
> com.icl.saxon.StyleSheet -r
> org.apache.xml.resolver.tools.CatalogResolver -x
> org.apache.xml.resolver.tools.ResolvingXMLReader -y
> org.apache.xml.resolver.tools.ResolvingXMLReader -u -o %o %i %s %v',
> u'name': u'saxon-cat'}),
> (u'xslt-stylesheet', {u'stylesheetPath': u'ooo2dbk.xsl', u'name':
> u'o2d4windows'}),
> (u'xslt-stylesheet', {u'stylesheetPath': u'ooo2dbk.xsl', u'name':
> u'o2d4unix'}),
> (u'dtd', {u'doctype-public': u'"-//OASIS//DTD DocBook XML V4.4//EN"',
> u'name': u'docbook44', u'doctype-system':
> u'"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";'}),
> (u'dtd', {u'doctype-public': u'"-//OASIS//DTD DocBook XML V4.3//EN"',
> u'name': u'docbook43', u'doctype-system':
> u'"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd";'}),
> (u'dtd', {u'doctype-public': u'"-//OASIS//DTD DocBook XML V4.1.2//EN"',
> u'name': u'docbook412', u'doctype-system':
> u'"http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd";'}),
> (u'images', {u'imageNameRoot': u'img', u'imagesRelativeDirectory':
> u'images'}),
> (u'oooserver', {u'host': u'localhost', u'port': u'2002'}),
> (u'ole', {u'imgFormat': u'png', u'scriptPath': u'ole2img.py'}),
> (u'ooopython', {u'path': u'/usr/bin/python'})]
> ---------------------------------------------------------------------
> 

This content of configElts you have written seems to be
the content of an old version of the config file.

Please check on your system how many ooo2dbk.xml (or even config.xml)
files do you have. I think that this is the problem. ooo2dbk looks
for configuration files in /etc too and you might have an old version
of ooo2dbk configuration files there.


> I have been looking at the file ooo2dbk.xml and it seems to have a
> statement for ooo2 as it has these lines:
> 
> ---------------------------------------------------------
> <xslt-stylesheet
>    name="ooo2"
>    stylesheetPath="ooo2dbk.odf.xsl"
>    />
> --------------------------------------------------------
> 

You're looking in the right direction.

You should have this file in your directories :
Products/PortalTransforms/transforms/ooo2dbk/ooo2dbk.odf.xsl



> Why the OOO Document does not uses the same transform as the one used by
> the type "File"?
> 

The tranformation for "File" transforms from OOo to HTML, then to text.
The tranformation for "OOo Documents" transforms from OOo, to DocBook,
then to HTML, and finally to text. The additional step to DocBook
adds retrieves more semantic and the HTML output is thus richer.
The main reason for this product is that some Nuxeo clients use DocBook
as the preferred open file format for long term storage.

Cheers,

-- 
Marc-Aurèle DARCHE
Open Source Enterprise Content Management (ECM)   http://www.nuxeo.org/
NUXEO (Paris, France)                             http://nuxeo.com/

_______________________________________________
cps-devel mailing list
http://lists.nuxeo.com/mailman/listinfo/cps-devel

Reply via email to