Hi again, On Mon, Mar 10, 2008 at 9:47 AM, Mathias Bauer <[EMAIL PROTECTED]> wrote: > Hi Julen, > > thanks for your proposal. I would like to clarify some questions before > we can decide whether we can assign a possible mentor to this project > and who this might be. > > > Julen wrote: > > > Hello, > > > > I'm writing to make a proposal for the upcoming Google Summer of Code. > > I'm not sure whether this is the right mailing list to send this > > information, so my apologies if I'm posting in the wrong place. > > > > This would be, more or less, my proposal: > > > > Project Title > > TM based CAT tool for Writer > > > > Summary > > Develop a TM (Translation Memory) based CAT (Computer-aided > > Translation) tool as an extension for Writer. This would be something > > similar to MS Word-based propietary addon Wordfast[1] or OmegaT[2], an > > open source tool written in Java which works as a desktop application. > > > > Abstract > > TM programs store previously translated source and target texts into a > > database in order to use them in the translation of new texts. Source > > text is split into translation units called segments. TMs are easily > > exportable and can be exchanged using an open standard format called > > TMX (Translation Memory eXchange)[3], which is implemented on top of > > XML. > > Any text file OpenOffice.org can open could be translatable using this > > tool just applying the appropriate segmentation rules for each > > filetype. > > If I understand TMX correctly, this approach will lose structural > information and attributes of the translated text. It would be nice to > have an extension that can retain as much of that information as > possible. This would require a solution that utilizes our new text > checking and markup API and stores some information along with the > generated TMX files that enables the extension to reestablish the > document structure and attributes. More or less it would mean that the > number and order of paragraphs could be stored along with the text that > is going to be translated. What do you think?
TMX is just a format for exchanging purposes and it can be a source to feed a TM data-base. It's like exporting our document to another format (but something more ;). So, don't confuse TMX (a specific format) with TM (a general approach). Although TMX format can store customized properties for a translation unit, the format itself doesn't include structural information about a document, e.g. odt. It's just plain text, translation units ready to be used. For more information about this format you can see the specification online[1] > > Professional translators use CAT tools from some many years ago, thus > > taking advantage of new technologies applied to natural language and > > having in this tools a significant help for their day-to-day work. > > Nowadays, translators have a wide variety of documents to translate, > > including text documents or even files related to software > > localization. Since many translators work in an office environment, > > Word-based solutions are widely used, e.g. Wordfast. > > OpenOffice.org lacks of this kind of tools, and therefore, it would be > > an opening door for translators to the open source community. This > > would benefit both translators and specially OpenOffice.org, having > > its popularity extended. > > Are there any Open Source translation tools available that use TMX, at > least in development? For a GSOC project it would be better to have > something directly usable in the end. It could also create a bridge > between the Open Source communities. As I pointed out on my first mail, there's an open source CAT tool written in Java ready to be used: OmegaT[2]. I would suggest to give it a try just to understand what this tool should be able to do and to clarify concepts. Please, feel free to ask for any more information if needed. > Best regards, > Mathias Thanks again, Julen. [1] http://www.lisa.org/fileadmin/standards/tmx1.4/tmx.htm [2] http://www.omegat.org/en/omegat.html --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
