Hi again,

On Mon, Mar 10, 2008 at 9:47 AM, Mathias Bauer <[EMAIL PROTECTED]> wrote:
> Hi Julen,
>
>  thanks for your proposal. I would like to clarify some questions before
>  we can decide whether we can assign a possible mentor to this project
>  and who this might be.
>
>
>  Julen wrote:
>
>  > Hello,
>  >
>  > I'm writing to make a proposal for the upcoming Google Summer of Code.
>  > I'm not sure whether this is the right mailing list to send this
>  > information, so my apologies if I'm posting in the wrong place.
>  >
>  > This would be, more or less, my proposal:
>  >
>  > Project Title
>  > TM based CAT tool for Writer
>  >
>  > Summary
>  > Develop a TM (Translation Memory) based CAT  (Computer-aided
>  > Translation) tool as an extension for Writer. This would be something
>  > similar to MS Word-based propietary addon Wordfast[1] or OmegaT[2], an
>  > open source tool written in Java which works as a desktop application.
>  >
>  > Abstract
>  > TM programs store previously translated source and target texts into a
>  > database in order to use them in the translation of new texts. Source
>  > text is split into translation units called segments. TMs are easily
>  > exportable and can be exchanged using an open standard format called
>  > TMX (Translation Memory eXchange)[3], which is implemented on top of
>  > XML.
>  > Any text file OpenOffice.org can open could be translatable using this
>  > tool just applying the appropriate segmentation rules for each
>  > filetype.
>
>  If I understand TMX correctly, this approach will lose structural
>  information and attributes of the translated text. It would be nice to
>  have an extension that can retain as much of that information as
>  possible. This would require a solution that utilizes our new text
>  checking and markup API and stores some information along with the
>  generated TMX files that enables the extension to reestablish the
>  document structure and attributes. More or less it would mean that the
>  number and order of paragraphs could be stored along with the text that
>  is going to be translated. What do you think?

TMX is just a format for exchanging purposes and it can be a source to
feed a TM data-base. It's like exporting our document to another
format (but something more ;). So, don't confuse TMX (a specific
format) with TM (a general approach).
Although TMX format can store customized properties for a translation
unit, the format itself doesn't include structural information about a
document, e.g. odt. It's just plain text, translation units ready to
be used. For more information about this format you can see the
specification online[1]

>  > Professional translators use CAT tools from some many years ago, thus
>  > taking advantage of new technologies applied to natural language and
>  > having in this tools a significant help for their day-to-day work.
>  > Nowadays, translators have a wide variety of documents to translate,
>  > including text documents or even files related to software
>  > localization. Since many translators work in an office environment,
>  > Word-based solutions are widely used, e.g. Wordfast.
>  > OpenOffice.org lacks of this kind of tools, and therefore, it would be
>  > an opening door for translators to the open source community. This
>  > would benefit both translators and specially OpenOffice.org, having
>  > its popularity extended.
>
>  Are there any Open Source translation tools available that use TMX, at
>  least in development? For a GSOC project it would be better to have
>  something directly usable in the end. It could also create a bridge
>  between the Open Source communities.

As I pointed out on my first mail, there's an open source CAT tool
written in Java ready to be used: OmegaT[2]. I would suggest to give
it a try just to understand what this tool should be able to do and to
clarify concepts.

Please, feel free to ask for any more information if needed.

>  Best regards,
>  Mathias

Thanks again,

Julen.

[1] http://www.lisa.org/fileadmin/standards/tmx1.4/tmx.htm
[2] http://www.omegat.org/en/omegat.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to