Re: [Moses-support] Moses-support Digest, Vol 41, Issue 31

support Tue, 23 Mar 2010 10:51:13 -0700

Precision Translation Tools announces the release of a new open source
data cleaning initiative, Corpus Filtergraph CE.


Corpus Filtergraph CE (Community Edition) is a cross-platform, Python
toolbox for extracting, filtering and aligning parallel language data
statistical machine translation systems. Corpus Filtergraph CE enables
users to transform existing translation memories and other documents into
parallel training corpora and language models for SMT.

The filtergraph manager guides documents through a series of processing
nodes. The nodes use a common plug-in API to host almost any open-source
library, such as sentence splitters and aligners. One example plugin
currently available and we invite the Moses Decoder to participate and
develop plugins to share with the larger SMT community. 

http://www.precisiontranslationtools.com/ptt/tools.html

Best regards, 
Tom
Precision Translation Tools Co., Ltd.



On Tue, Mar 23, 2010 at 12:03 PM, <[email protected]> wrote:

Send Moses-support mailing list submissions to
       [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
       http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
       [email protected]

You can reach the person managing the list at
       [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

  1. Re: Tool for segmenting the sentences of a        bilingual corpus
     (Achim Ruopp)
  2. Re: Tool for segmenting the sentences of a bilingual corpus
     (John Morgan)


----------------------------------------------------------------------

Message: 1
Date: Mon, 22 Mar 2010 15:09:56 -0400
From: Achim Ruopp <[email protected]>
Subject: Re: [Moses-support] Tool for segmenting the sentences of a
       bilingual corpus
To: J?rg Tiedemann <[email protected]>
Cc: [email protected]
Message-ID:
       <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1

Just in case you need a library - I recently packaged the Europarl
sentence splitter and sentence aligner tools into two Perl modules on
CPAN:
http://search.cpan.org/~achimru/Lingua-Sentence-1.00/
http://search.cpan.org/~achimru/Text-GaleChurch-1.00/

Achim

2010/3/22 J?rg Tiedemann <[email protected]>:
>
> Europarl comes with a sentence aligner:
> http://statmt.org/europarl/v5/tools.tgz
>
> You can also use hunalign:
> http://mokk.bme.hu/resources/hunalign
> (look at the "realign" feature for lexical matching)
> GMA:
> http://nlp.cs.nyu.edu/GMA/
>
> Uplug includes all three and also a tool for interactive
> (semi-automatic) sentence alignment:
> http://sourceforge.net/projects/uplug/
> http://www.let.rug.nl/~tiedeman/Uplug/php/
>
>
> J?rg
>
>
> Raphael Payen wrote:
>> Hi
>>
>>>From what I've seen, moses, even with all the tools that go with it,
>> requires a sentence-aligned bilingual corpus as its input. What if we
>> only have an unaligned parallel corpus ? Do you know if there are
>> tools available to do this sentence-level alignment ? There seems to
>> be something in python-nltk, based on Gale & Church, but it is recent
>> and not yet completely part of the package. Besides, Gale & Church
>> algorithm uses only sentence lengths, probably there exist more
>> powerful algorithms, using dictionaries of word alignment information
>> ? (I mean "static" dictionaries provided beforehand; I guess
>> theoretically there could be ways to "dynamically" use a word aligner
>> like giza on an unaligned corpus, compute some word alignments, use
>> them to compute the sentence alignements, and feed this to itself, but
>> static dictionaries seem more practical).
>>
>> Also, since this step usually requires human supervision, do you know
>> if there are there open-source / unix GUI tools to assist in editing
>> the alignements proposed ? (comparable to Trados WinAlign) ?
>>
>> Best regards,
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



------------------------------

Message: 2
Date: Mon, 22 Mar 2010 21:40:10 -0400
From: "John Morgan" <[email protected]>
Subject: Re: [Moses-support] Tool for segmenting the sentences of a
       bilingual corpus
To: " 'J?rg Tiedemann' " <[email protected]>,        "'Raphael
       Payen'" <[email protected]>
Cc: [email protected]
Message-ID: <005401caca29$c6c57660$0201a...@surubi>
Content-Type: text/plain; charset=iso-8859-1

I think the bilingual sentence aligner by Bob Moore of Microsoft  does
what
you want.
http://research.microsoft.com/en-us/people/bobmoore/
J


-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of J?rg Tiedemann
Sent: Monday, March 22, 2010 11:36 AM
To: Raphael Payen
Cc: [email protected]
Subject: Re: [Moses-support] Tool for segmenting the sentences of a
bilingual corpus


Europarl comes with a sentence aligner:
http://statmt.org/europarl/v5/tools.tgz

You can also use hunalign:
http://mokk.bme.hu/resources/hunalign
(look at the "realign" feature for lexical matching)
GMA:
http://nlp.cs.nyu.edu/GMA/

Uplug includes all three and also a tool for interactive
(semi-automatic) sentence alignment:
http://sourceforge.net/projects/uplug/
http://www.let.rug.nl/~tiedeman/Uplug/php/


J?rg


Raphael Payen wrote:
> Hi
>
>>From what I've seen, moses, even with all the tools that go with it,
> requires a sentence-aligned bilingual corpus as its input. What if we
> only have an unaligned parallel corpus ? Do you know if there are
> tools available to do this sentence-level alignment ? There seems to
> be something in python-nltk, based on Gale & Church, but it is recent
> and not yet completely part of the package. Besides, Gale & Church
> algorithm uses only sentence lengths, probably there exist more
> powerful algorithms, using dictionaries of word alignment information
> ? (I mean "static" dictionaries provided beforehand; I guess
> theoretically there could be ways to "dynamically" use a word aligner
> like giza on an unaligned corpus, compute some word alignments, use
> them to compute the sentence alignements, and feed this to itself, but
> static dictionaries seem more practical).
>
> Also, since this step usually requires human supervision, do you know
> if there are there open-source / unix GUI tools to assist in editing
> the alignements proposed ? (comparable to Trados WinAlign) ?
>
> Best regards,
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
Internal Virus Database is out of date.
Checked by AVG - www.avg.com
Version: 9.0.707 / Virus Database: 270.14.67/2505 - Release Date: 11/15/09
15:50:00




------------------------------

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 41, Issue 31
*********************************************

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses-support Digest, Vol 41, Issue 31

Reply via email to