Dear Nadir, Thank you very much for explaining transliteration. I have "yes" for both transliteration-module and post-decoding-transliteration in the EMS configuration file used for en-ru.
Best Regards, Ergun Ergun Biçici, CNGL, School of Computing, DCU, www.cngl.ie http://www.computing.dcu.ie/~ebicici/ ---------- Forwarded message ---------- From: Nadir Durrani <[email protected]> Date: Wed, May 6, 2015 at 11:17 AM Subject: Re: Transliteration model is using processPhraseTable, which is not found in Moses version 3.0 To: Ergun Bicici <[email protected]> Hi Ergun, If you are only going to do transliteration-module = "yes" Moses will train the transliteration system but not going to do anything with it. You have to select whether you want to use post-deocoding or in-decoding transliteration. In post-decoding method, transliteration is done in the post-decoding step i.e. the decoder has translated all the sentences and now you just need to replace OOV words with their best transliteration given the context. This is Method 2 as described in the following paper http://aclweb.org/anthology//E/E14/E14-4029.pdf you can enable it by using post-decoding-transliteration = "yes" Using in-decoding method (Method 3 in the paper), you do transliteration inside the decoder on the fly. The advantage of this over Method 2 in theory is that you can also reorder the OOV word and make use of other features. But it does not give any clear-cut gains. More details here: http://www.statmt.org/moses/?n=Advanced.OOVs Nadir >> On Tue, May 5, 2015 at 5:33 PM, Ergun Bicici >> <[email protected]> wrote: >> > >> > Hi Nadir, >> > >> > I am using Moses 3.0 and for transliteration to work, I copied >> > scripts/Transliteration/ from latest onto Moses 3.0 path, re-ran, and >> > obtained translation results. >> > >> > >> > Best Regards, >> > Ergun >> > >> > Ergun Biçici, CNGL, School of Computing, DCU, www.cngl.ie >> > http://www.computing.dcu.ie/~ebicici/ >> > >> > >> > On Mon, May 4, 2015 at 7:32 AM, Nadir Durrani <[email protected]> >> > wrote: >> >> >> >> Hi Ergun, >> >> >> >> processPhraseTable is no longer supported by Moses. But I see that >> >> Phil Williams has already fixed this problem in transliteration >> >> module, by changing >> >> >> >> `$MOSES_SRC/scripts/training/filter-model-given-input.pl >> >> $TRANSLIT_MODEL/evaluation/$eval_file.filtered >> >> $TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini >> >> $TRANSLIT_MODEL/evaluation/$eval_file -Binarizer >> >> "$MOSES_SRC/bin/processPhraseTable"`; >> >> >> >> to >> >> >> >> `$MOSES_SRC/scripts/training/filter-model-given-input.pl >> >> $TRANSLIT_MODEL/evaluation/$eval_file.filtered >> >> $TRANSLIT_MODEL/evaluation/$eval_file.moses.table.ini >> >> $TRANSLIT_MODEL/evaluation/$eval_file -Binarizer >> >> "$MOSES_SRC/bin/CreateOnDiskPt 1 1 4 100 2"`; >> >> >> >> in >> >> >> >> path-to-moses/scripts/Transliteration/in-decoding-transliteration.pl >> >> >> >> Here's the commit >> >> >> >> >> >> >> >> https://github.com/moses-smt/mosesdecoder/commit/7e54e23fe234ac48f44beeee0e473d09a5b4d5f6 >> >> >> >> May be you pulled and in between version where the processPhraseTable >> >> was removed but transliteration scripts were not fixed. >> >> >> >> Cheers, >> >> Nadir >> >> >> >> >> >> On Mon, May 4, 2015 at 7:46 AM, <[email protected]> wrote: >> >> > Send Moses-support mailing list submissions to >> >> > [email protected] >> >> > >> >> > To subscribe or unsubscribe via the World Wide Web, visit >> >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > or, via email, send a message with subject or body 'help' to >> >> > [email protected] >> >> > >> >> > You can reach the person managing the list at >> >> > [email protected] >> >> > >> >> > When replying, please edit your Subject line so it is more specific >> >> > than "Re: Contents of Moses-support digest..." >> >> > >> >> > >> >> > Today's Topics: >> >> > >> >> > 1. Re: 12-gram language model ARPA file for 16GB (liling tan) >> >> > 2. Transliteration model is using processPhraseTable, which is >> >> > not found in Moses version 3.0 (Ergun Bicici) >> >> > 3. Re: Transliteration model is using processPhraseTable, which >> >> > is not found in Moses version 3.0 (Hieu Hoang) >> >> > 4. Europarl monolingual corpus (Hieu Hoang) >> >> > >> >> > >> >> > >> >> > ---------------------------------------------------------------------- >> >> > >> >> > Message: 1 >> >> > Date: Sun, 3 May 2015 19:44:12 +0200 >> >> > From: liling tan <[email protected]> >> >> > Subject: Re: [Moses-support] 12-gram language model ARPA file for >> >> > 16GB >> >> > To: moses-support <[email protected]> >> >> > Message-ID: >> >> > >> >> > <CAKzPaJJ7fY=9C89POact542vu32d+H3=0i_Dnaj=yfizbfa...@mail.gmail.com> >> >> > Content-Type: text/plain; charset="utf-8" >> >> > >> >> > Dear Moses devs/users, >> >> > >> >> > For now, I only know that it takes more than 250GB. I've 250GB of >> >> > free >> >> > space and KenLM got "poisoned" by insufficient space... >> >> > >> >> > Does anyone have an idea how big would a 12-gram language model ARPA >> >> > file >> >> > trained on 16GB of text become? >> >> > >> >> > STDERR: >> >> > >> >> > === 1/5 Counting and sorting n-grams === >> >> > Reading /media/2tb/wmt15/corpus.truecase/train-lm.en >> >> > >> >> > >> >> > ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >> >> > tcmalloc: large alloc 7846035456 bytes == 0x10f4000 @ >> >> > tcmalloc: large alloc 73229664256 bytes == 0x1d542e000 @ >> >> > >> >> > >> >> > **************************************************************************************************** >> >> > Unigram tokens 3038737446 types 5924314 >> >> > === 2/5 Calculating and sorting adjusted counts === >> >> > Chain sizes: 1:71091768 2:804524736 3:1508483968 4:2413574144 >> >> > 5:3519795968 >> >> > 6:4827148288 7:6335632384 8:8045247488 9:9955993600 10:12067871744 >> >> > 11:14380880896 12:16895020032 >> >> > tcmalloc: large alloc 16895025152 bytes == 0x1d542e000 @ >> >> > tcmalloc: large alloc 2413576192 bytes == 0x8f2a0000 @ >> >> > tcmalloc: large alloc 3519799296 bytes == 0x5c4488000 @ >> >> > tcmalloc: large alloc 4827152384 bytes == 0x696146000 @ >> >> > tcmalloc: large alloc 6335635456 bytes == 0x7b5cce000 @ >> >> > tcmalloc: large alloc 8045248512 bytes == 0x92f6f0000 @ >> >> > tcmalloc: large alloc 9955999744 bytes == 0xb0ef7c000 @ >> >> > tcmalloc: large alloc 12067872768 bytes == 0xd60644000 @ >> >> > tcmalloc: large alloc 14380883968 bytes == 0x12f616e000 @ >> >> > Last input should have been poison. >> >> > Last input should have been poison.util/file.cc:196 in void >> >> > util::WriteOrThrow(int, const void*, std::size_t) threw FDException >> >> > because >> >> > `ret < 1'. >> >> > No space left on device in /tmp/PC2o3z (deleted) while writing >> >> > 5301120368 >> >> > bytes >> >> > >> >> > Last input should have been poison.util/file.cc:196 in void >> >> > util::WriteOrThrow(int, const void*, std::size_t) threw FDException >> >> > because >> >> > `ret < 1'. >> >> > No space left on device in /tmp/PftXeo (deleted) while writing >> >> > 1941075872 >> >> > bytesLast input should have been poison. >> >> > >> >> > util/file.cc:196 in void util::WriteOrThrow(int, const void*, >> >> > std::size_t) >> >> > threw FDException because `ret < 1'. >> >> > No space left on device in /tmp/CuZcPM (deleted) while writing >> >> > 2984722272 >> >> > bytes >> >> > >> >> > util/file.cc:196 in void util::WriteOrThrow(int, const void*, >> >> > std::size_t) >> >> > threw FDException because `ret < 1'. >> >> > No space left on device in /tmp/F2bE8A (deleted) while writing >> >> > 389439488 >> >> > bytes >> >> > >> >> > Regards, >> >> > Liling >> >> > -------------- next part -------------- >> >> > An HTML attachment was scrubbed... >> >> > URL: >> >> > >> >> > http://mailman.mit.edu/mailman/private/moses-support/attachments/20150503/b56dc8ba/attachment-0001.htm >> >> > >> >> > ------------------------------ >> >> > >> >> > Message: 2 >> >> > Date: Sun, 3 May 2015 22:42:22 +0100 >> >> > From: Ergun Bicici <[email protected]> >> >> > Subject: [Moses-support] Transliteration model is using >> >> > processPhraseTable, which is not found in Moses version 3.0 >> >> > To: moses-support <[email protected]> >> >> > Message-ID: >> >> > >> >> > <CAB2pGncpvc4roLXwLcFcXytZHKEqSZvzaX2L16Yfo=p-vq1...@mail.gmail.com> >> >> > Content-Type: text/plain; charset="utf-8" >> >> > >> >> > binarizing...gzip -cd >> >> > >> >> > >> >> > en-ru_path/model/Transliteration.8/tuning/filtered/phrase-table.0-0.1.1.gz >> >> > | LC_ALL=C sort -T en-ru_path/model/Transliteration.8/tuning/filtered >> >> > | >> >> > moses_3.0/mosesdecoder/bin/processPhraseTable -ttable 0 0 - -nscores >> >> > 4 >> >> > -out >> >> > >> >> > en-ru_path/model/Transliteration.8/tuning/filtered/phrase-table.0-0.1.1 >> >> > sh: moses_3.0/mosesdecoder/bin/processPhraseTable: No such file or >> >> > directory >> >> > sort: write failed: standard output: Broken pipe >> >> > sort: write error >> >> > >> >> > How can I have processPhraseTable built? >> >> > >> >> > Best Regards, >> >> > Ergun >> >> > >> >> > Ergun Bi?ici, CNGL, School of Computing, DCU, www.cngl.ie >> >> > http://www.computing.dcu.ie/~ebicici/ >> >> > -------------- next part -------------- >> >> > An HTML attachment was scrubbed... >> >> > URL: >> >> > >> >> > http://mailman.mit.edu/mailman/private/moses-support/attachments/20150503/dacaa1c9/attachment-0001.htm >> >> > >> >> > ------------------------------ >> >> > >> >> > Message: 3 >> >> > Date: Mon, 04 May 2015 08:31:18 +0400 >> >> > From: Hieu Hoang <[email protected]> >> >> > Subject: Re: [Moses-support] Transliteration model is using >> >> > processPhraseTable, which is not found in Moses version 3.0 >> >> > To: Ergun Bicici <[email protected]>, moses-support >> >> > <[email protected]> >> >> > Message-ID: <[email protected]> >> >> > Content-Type: text/plain; charset="windows-1252" >> >> > >> >> > do you know where the processPhraseTable exec is being called from? >> >> > >> >> > it would be helpful so we can make sure it uses something else. >> >> > >> >> > if you really want processPhraseTable back, uncomment 3 lines in >> >> > misc/Jamfile >> >> > >> >> > +++ b/misc/Jamfile >> >> > @@ -1,8 +1,8 @@ >> >> > -#exe processPhraseTable : GenerateTuples.cpp processPhraseTable.cpp >> >> > ..//boost_filesystem ../moses//moses ; >> >> > +exe processPhraseTable : GenerateTuples.cpp processPhraseTable.cpp >> >> > ..//boost_filesystem ../moses//moses ; >> >> > >> >> > exe processLexicalTable : processLexicalTable.cpp >> >> > ..//boost_filesystem >> >> > ../moses//moses ; >> >> > >> >> > -#exe queryPhraseTable : queryPhraseTable.cpp ..//boost_filesystem >> >> > ../moses//moses ; >> >> > +exe queryPhraseTable : queryPhraseTable.cpp ..//boost_filesystem >> >> > ../moses//moses ; >> >> > >> >> > exe queryLexicalTable : queryLexicalTable.cpp ..//boost_filesystem >> >> > ../moses//moses ; >> >> > >> >> > @@ -46,6 +46,6 @@ $(TOP)//boost_iostreams >> >> > $(TOP)//boost_program_options >> >> > ; >> >> > >> >> > -alias programs : 1-1-Extraction TMining generateSequences >> >> > processLexicalTable queryLexicalTable programsMin programsProbing >> >> > merge-sorted prunePhraseTable ; >> >> > -#processPhraseTable queryPhraseTable >> >> > +alias programs : 1-1-Extraction TMining generateSequences >> >> > processLexicalTable queryLexicalTable programsMin programsProbing >> >> > merge-sorted prunePhraseTable processPhraseTable queryPhraseTable ; >> >> > >> >> > On 04/05/2015 01:42, Ergun Bicici wrote: >> >> >> >> >> >> binarizing...gzip -cd >> >> >> >> >> >> >> >> >> en-ru_path/model/Transliteration.8/tuning/filtered/phrase-table.0-0.1.1.gz >> >> >> | LC_ALL=C sort -T >> >> >> en-ru_path/model/Transliteration.8/tuning/filtered >> >> >> | moses_3.0/mosesdecoder/bin/processPhraseTable -ttable 0 0 - >> >> >> -nscores >> >> >> 4 -out >> >> >> >> >> >> en-ru_path/model/Transliteration.8/tuning/filtered/phrase-table.0-0.1.1 >> >> >> sh: moses_3.0/mosesdecoder/bin/processPhraseTable: No such file or >> >> >> directory >> >> >> sort: write failed: standard output: Broken pipe >> >> >> sort: write error >> >> >> >> >> >> How can I have processPhraseTable built? >> >> >> >> >> >> Best Regards, >> >> >> Ergun >> >> >> >> >> >> Ergun Bi?ici, CNGL, School of Computing, DCU, www.cngl.ie >> >> >> <http://www.cngl.ie> >> >> >> http://www.computing.dcu.ie/~ebicici/ >> >> >> <http://www.computing.dcu.ie/%7Eebicici/> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> Moses-support mailing list >> >> >> [email protected] >> >> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > >> >> > -- >> >> > Hieu Hoang >> >> > Researcher >> >> > New York University, Abu Dhabi >> >> > http://www.hoang.co.uk/hieu >> >> > >> >> > -------------- next part -------------- >> >> > An HTML attachment was scrubbed... >> >> > URL: >> >> > >> >> > http://mailman.mit.edu/mailman/private/moses-support/attachments/20150504/303023d0/attachment-0001.htm >> >> > >> >> > ------------------------------ >> >> > >> >> > Message: 4 >> >> > Date: Mon, 4 May 2015 08:46:15 +0400 >> >> > From: Hieu Hoang <[email protected]> >> >> > Subject: [Moses-support] Europarl monolingual corpus >> >> > To: moses-support <[email protected]> >> >> > Message-ID: >> >> > >> >> > <caekmkbio64f_m20rwnxydoj60fhez_oo+by+hzkw3tbfukp...@mail.gmail.com> >> >> > Content-Type: text/plain; charset="utf-8" >> >> > >> >> > What's the easiest way get the single-language data from the Europarl >> >> > corpus as described in the 1st table in: >> >> > http://statmt.org/europarl/ >> >> > >> >> > I tried downloading the xml source >> >> > http://statmt.org/europarl/v7/europarl.tgz >> >> > stripping the xml and running split-sentence.perl, but this takes an >> >> > unfathomably long time >> >> > >> >> > Hieu Hoang >> >> > Researcher >> >> > New York University, Abu Dhabi >> >> > http://www.hoang.co.uk/hieu >> >> > -------------- next part -------------- >> >> > An HTML attachment was scrubbed... >> >> > URL: >> >> > >> >> > http://mailman.mit.edu/mailman/private/moses-support/attachments/20150504/ba5b4087/attachment.htm >> >> > >> >> > ------------------------------ >> >> > >> >> > _______________________________________________ >> >> > Moses-support mailing list >> >> > [email protected] >> >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > >> >> > >> >> > End of Moses-support Digest, Vol 103, Issue 5 >> >> > ********************************************* >> > >> > > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
