Sorry to bug Authors with that, but I can't seem to have a reaction
on OD-users...
Since Authors is also involved in the process of translating, I
thought we could discuss that as a separate thread, not so much
directly related to the contents of a document but to access to its
contents for translation.
I wrote this mail with OmegaT in mind since it seems to be used more
and more within the OOo community to translate OOo produced documents
-and not so much imported documents from the MS world.
The Italian documentation translation group (I think) seems to be
using OmegaT and thanks to their work we have discovered glitches in
the OOo tags support. I think the reason is that before, when OmegaT
was used to translate mainly texts imported from MS, OOo filtering
was very conservative and would produce a very small quantity of OOo
tags, that were all properly handled by OmegaT. Now that people use
OOo to create documents (and it is especially true not for the OOo
documentation process), authors are free to use the full creativity
options of OOo and the generated documents are now much more complex.
As written in the mail below (originally 3 mails sent to xml-dev,
recombined into one sent to OD-users) most xml based translation
standards expect three types of string formats in a document:
-block level format->equivalent of <text:p> in OOo or <p> in HTML,
sets a property for a whole block of string
-inline level format->equivalent of <text:span> in OOo or <span> in
HTML, sets a property for a subset of the block
-subflows->would be alternative text for a picture (that appears in a
box when the mouse comes on the picture etc), mostly is _within_ the
tag, as a attribute value, like <whatever:whatever name="alternate
text"> in OOo similar to <h2 id="identity of this title"> in HTML.
So, that was the background of the mail. If you have any interest in
such topics please go on reading :)
Regards,
Jean-Christophe Helary
==================================
(3 mails combined in one. Already sent to xml-dev on OOo, but I
thought maybe this list was more relevant)
I would like to know if there is a "simple" way to identify:
-block level textual information
-inline level textual information
-localizable subflows present in tags attributes values.
This, to parse an OD file and be able to properly segment the text
for use in a CAT tool.
By "subflows" I mean information that is not between tags but inside
tags as attributes values ( text:name="something like alternative
text for graphic items for ex").
It looks like there is a very wide range of possible
<text:"attributes" and I found that <text:sequence seems to be inline
as well as <text:user-defined
I am looking for an extensive list of such attributes and their
characteristics (block or inline).
The point being that OD needs a proper parsing of its localisable
data so that computer assisted translation tool (the apps translators
use everyday) give proper access to the proper data within their
translation framework.
It would be nice if OD provided such "meta" information as too what
is localizable, what is not and how that fits in the block/inline/
subflow that most translation related standards (TMX/XLIFF etc) are
based on.
Of course, developers can always check the specification and take
guesses as to what is what, just like what I trying to do right now,
but I'd say that's could be part of the OD specification to provide
easy access to such meta information to make sure there is proper
implementation on the CAT tool development side.
Regards,
Jean-Christophe Helary
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
open.org
For additional commands, e-mail: [EMAIL PROTECTED]
open.org