Sorry to bug Authors with that, but I can't seem to have a reaction on OD-users...

Since Authors is also involved in the process of translating, I thought we could discuss that as a separate thread, not so much directly related to the contents of a document but to access to its contents for translation.

I wrote this mail with OmegaT in mind since it seems to be used more and more within the OOo community to translate OOo produced documents -and not so much imported documents from the MS world.

The Italian documentation translation group (I think) seems to be using OmegaT and thanks to their work we have discovered glitches in the OOo tags support. I think the reason is that before, when OmegaT was used to translate mainly texts imported from MS, OOo filtering was very conservative and would produce a very small quantity of OOo tags, that were all properly handled by OmegaT. Now that people use OOo to create documents (and it is especially true not for the OOo documentation process), authors are free to use the full creativity options of OOo and the generated documents are now much more complex.

As written in the mail below (originally 3 mails sent to xml-dev, recombined into one sent to OD-users) most xml based translation standards expect three types of string formats in a document:

-block level format->equivalent of <text:p> in OOo or <p> in HTML, sets a property for a whole block of string -inline level format->equivalent of <text:span> in OOo or <span> in HTML, sets a property for a subset of the block -subflows->would be alternative text for a picture (that appears in a box when the mouse comes on the picture etc), mostly is _within_ the tag, as a attribute value, like <whatever:whatever name="alternate text"> in OOo similar to <h2 id="identity of this title"> in HTML.

So, that was the background of the mail. If you have any interest in such topics please go on reading :)

Regards,
Jean-Christophe Helary

==================================
(3 mails combined in one. Already sent to xml-dev on OOo, but I thought maybe this list was more relevant)

I would like to know if there is a "simple" way to identify:

-block level textual information
-inline level textual information
-localizable subflows present in tags attributes values.

This, to parse an OD file and be able to properly segment the text for use in a CAT tool.

By "subflows" I mean information that is not between tags but inside tags as attributes values ( text:name="something like alternative text for graphic items for ex").

It looks like there is a very wide range of possible <text:"attributes" and I found that <text:sequence seems to be inline as well as <text:user-defined

I am looking for an extensive list of such attributes and their characteristics (block or inline).

The point being that OD needs a proper parsing of its localisable data so that computer assisted translation tool (the apps translators use everyday) give proper access to the proper data within their translation framework.

It would be nice if OD provided such "meta" information as too what is localizable, what is not and how that fits in the block/inline/ subflow that most translation related standards (TMX/XLIFF etc) are based on.

Of course, developers can always check the specification and take guesses as to what is what, just like what I trying to do right now, but I'd say that's could be part of the OD specification to provide easy access to such meta information to make sure there is proper implementation on the CAT tool development side.

Regards,
Jean-Christophe Helary

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] open.org For additional commands, e-mail: [EMAIL PROTECTED] open.org

Reply via email to