Hi Megan, I've also had this problem in the past. In my case it was fixed by typing "export LC_ALL=C" prior to running the processPhraseTable command. I hope that helps.
Kevin. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, July 14, 2008 11:46 AM To: [email protected] Subject: Moses-support Digest, Vol 21, Issue 8 Send Moses-support mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit http://mailman.mit.edu/mailman/listinfo/moses-support or, via email, send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach the person managing the list at [EMAIL PROTECTED] When replying, please edit your Subject line so it is more specific than "Re: Contents of Moses-support digest..." Today's Topics: 1. Re: OT: LDC2004E12 (Ham, Michael) 2. Re: phrase table memory issue (Philipp Koehn) 3. Re: Re : [getting started] help (Philipp Koehn) 4. Re: phrase table memory issue (Megan Elmore ([EMAIL PROTECTED])) 5. Re: [Bulk] Re: phrase table memory issue (Hieu Hoang) ---------------------------------------------------------------------- Message: 1 Date: Sun, 13 Jul 2008 22:08:14 -0400 From: "Ham, Michael" <[EMAIL PROTECTED]> Subject: Re: [Moses-support] OT: LDC2004E12 To: <[email protected]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="us-ascii" Those escape numbers are Unicode characters. The Chinese character set does not exist in ASCII, so you have to use UTF-8. However, in addition to doing this, you also need to install a font that can show Chinese characters. One that I have gotten to work that you may want to look into is the Bitstream Cyberbit font. You can download it here: http://http.netscape.com.edgesuite.net/pub/communicator/extras/fonts/win dows/Cyberbit.ZIP I hope this helps! - Michael ------------------------------ Date: Fri, 11 Jul 2008 15:39:11 -0400 From: "John D. Burger" <[EMAIL PROTECTED]> Subject: [Moses-support] OT: LDC2004E12 To: [email protected] Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Sorry for the slightly off-topic message, but at least it's about MT: We're using the UN Chinese-English Parallel Text collection (LDC2004E12) for some of our work. It has lots of odd sequences of the form: \x{a37e} I presume these are hex codes indicating escaped characters or something, but I'm not sure what. Has anyone done anything with these, other than ignore or delete them? Thanks. - John Burger MITRE ------------------------------ Message: 2 Date: Sat, 12 Jul 2008 10:16:21 +0000 (UTC) From: Vineet Kashyap <[EMAIL PROTECTED]> Subject: [Moses-support] Unknown words To: [email protected] Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=us-ascii Hi all 1. is there a way to output unknown words to a separate file instead of dropping them as i think we can add those words to the dictionary which will improve the accuracy ? 2. also, when adding dictionary to the parallel corpus as suggested by Phillip in the previous post you have one word in the source language and the other in the target language is that correct? 3. Does BLEU uses a reference file with accurate human translations to estimate a score ? And if not would it be better to evaluate the system with such a reference file with accurate translations ? 4. what value of BLEU means good translations ? in percentage... and for comparison purposes how would a human judge a MT system's performance ? 5. can we train higher order language models with SRILM with a small corpus or have to use IRSTLM ? Thanks a lot in advance for taking the time in answering these questions. Regards, Vineet ------------------------------ _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support End of Moses-support Digest, Vol 21, Issue 7 ******************************************** ------------------------------ Message: 2 Date: Mon, 14 Jul 2008 06:46:47 +0100 From: "Philipp Koehn" <[EMAIL PROTECTED]> Subject: Re: [Moses-support] phrase table memory issue To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]> Cc: [email protected] Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=ISO-8859-1 Hi, are you sorting the phrase table? Check the command as described on the Moses web site. -phi On Wed, Jul 9, 2008 at 8:21 PM, Megan Elmore ([EMAIL PROTECTED]) <[EMAIL PROTECTED]> wrote: > Hello, > > Thanks very much for your quick reply. I am currently trying to generate a binary phrase table but am getting an error: > > ERROR: xsource phrase already inserted (B)! > line(17): '000 - ||| 000 ? ||| (0) (1) ||| (0) (1) ||| 0.5 0.540651 0.25 0.178456 2.718' > f: 2 0 2 > > Does this indicate a problem with my phrase table or with the processPhraseTable process? In the event that I need to run the training process differently - what error or warning messages, if any, that are generated during the training process would let me know of any errors in my phrase table? > > Currently, the phrase table generated during the training process was left in a gzip'ped format as phrase-table.0-0.gz - I am not sure if this is relevant, but maybe the odd naming (as opposed to just "phrase-table" listed in the online documentation) sheds light on a step of the training process that did not complete normally for me? > > -Megan > > ----- Original Message ----- > From: Philipp Koehn <[EMAIL PROTECTED]> > Date: Wednesday, July 9, 2008 2:25 pm > Subject: Re: [Moses-support] phrase table memory issue > To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]> > Cc: [email protected] > >> Hi, >> >> this is a sign that the phrase table is too big to load into memory, >> there are three options: >> (a) use the binary phrase table >> (b) filter the phrase table for the test set you are using >> (c) both >> >> See the Moses web page for details. >> >> -phi >> >> On Wed, Jul 9, 2008 at 7:17 PM, Megan Elmore ([EMAIL PROTECTED]) >> <[EMAIL PROTECTED]> wrote: >> > Hello, >> > >> > I have installed Moses and run the training process using the >> europarl corpus but am now having problems with the decoder loading >> the phrase table. Like a previous message on this list, I am >> getting the error >> > >> > terminate called after throwing an instance of 'std::bad_alloc' >> > what(): St9bad_alloc >> > Aborted >> > >> > while the decoder is trying to load the phrase table, regardless >> of the machine I run the decoder on (I've tried four now). Is there >> a way I can optimize how much space the phrase table uses? Or is >> there something that could be going wrong in the training or >> decoding processes? I am not sure where to look for the error but >> with a little direction I could keep trying to debug it. >> > >> > Thanks, >> > -Megan E. >> > _______________________________________________ >> > Moses-support mailing list >> > [email protected] >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > >> > > ------------------------------ Message: 3 Date: Mon, 14 Jul 2008 07:14:38 +0100 From: "Philipp Koehn" <[EMAIL PROTECTED]> Subject: Re: [Moses-support] Re : [getting started] help To: "Pham Thi Anh Vi" <[EMAIL PROTECTED]> Cc: [email protected] Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=UTF-8 Hi, your compile with the irstlm did not work correctly, otherwise it would recognize the language model option. -phi On Mon, Jul 7, 2008 at 2:17 AM, Pham Thi Anh Vi <[EMAIL PROTECTED]> wrote: > Hi Mailing list, > > On Tue, Apr 22, 2008 at 9:31 PM, sushil ronghe <[EMAIL PROTECTED]> > >> wrote: >> > Dear all, >> > >> > I am trying to compiled moses with irstlm library in path. >> > the compilation has not given any error so i thought it is done. >> > >> > then while testing with the sample model > >> > http://www.statmt.org/moses/download/sample-models.tgz >> > i got this messages >> > ------------------------- > >> > Defined parameters (per moses.ini or switch): >> > config: moses.ini >> > input-factors: 0 >> > lmodel-file: 0 0 3 ../lm/europarl.srilm.gz >> > mapping: T 0 > >> > ttable-file: 0 0 1 phrase-table >> > ttable-limit: 10 >> > weight-d: 1 >> > weight-l: 1 >> > weight-t: 1 >> > weight-w: 0 > >> > Loading lexical distortion models... >> > have 0 models >> > Start loading LanguageModel ../lm/europarl.srilm.gz : [0.000] seconds >> > ERROR:Language model type unknown. Probably not compiled into library > >> > ERROR:no LM created. We probably don't have it compiled >> > >> > I am unable to understand what this error message is suggesting. >> > >> > I have installed the moses and irstlm on i686 with OS ubuntu. > >> > the compilation has not given any error. >> > >> > please help me to figure out what is going wrong. >> > >> > >> > Thanks > > I have the same Error. I supplied the following setting for the language > model switch as Emmanuel: > >> lmodel-file: 1 0 5 ../lm/europarl.srilm.blm, but the error still availble. >> Here is my moses config : > > ######################### > ### MOSES CONFIG FILE ### > ######################### > # input factors > [input-factors] > 0 > > # mapping steps > [mapping] > 0 T 0 > > # translation tables: source-factors, target-factors, number of scores, file > > [ttable-file] > 1 0 5 > /home/zil/Working/Language_models/Rbtdfinal_280k/model/binary_phrasetabl e/phrase-table.0-0.gz > > # no generation models, no generation-file section > > # language models: type(srilm/irstlm), factors, order, file > > [lmodel-file] > 1 0 5 > /hoe/zil/Working/Language_models/Rbtdfinal_280k/binary_lm/GwtwVnthuquan. blm > > > # limit on how many phrase translations e for each phrase f are loaded > # 0 = all elements loaded > [ttable-limit] > > 20 > 0 > # distortion (reordering) files > [distortion-file] > 0-0 msd-bidirectional-fe 6 > /home/zil/Working/Language_models/Rbtdfinal_280k/model/binary_reordering /reordering-table.msd-bidirectional-fe.0.5.0-0.gz > > > # distortion (reordering) weight > [weight-d] > 0.3 > 0.3 > 0.3 > 0.3 > 0.3 > 0.3 > 0.3 > > # language model weights > [weight-l] > 0.5000 > > > # translation model weights > [weight-t] > 0.2 > 0.2 > > 0.2 > 0.2 > 0.2 > > # no generation models, no weight-generation section > > # word penalty > [weight-w] > -1 > > [distortion-limit] > 6 > > Please help me to figure out what is going wrong. -- > ============================= > Ph?m Th? ?nh Vi > VIEGRID JSC Hu? > Mobile phone : 0984693313 > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > ------------------------------ Message: 4 Date: Mon, 14 Jul 2008 11:17:17 -0400 From: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]> Subject: Re: [Moses-support] phrase table memory issue To: [email protected] Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=iso-8859-1 Hello again, Yes, I was using the command as described on the Moses web site at http://www.statmt.org/moses/?n=Moses.AdvancedFeatures. I have also tried piping the results from sort through uniq before piping it into processPhraseTable and encountered the same error. Perhaps I am unaware of some option to provide to sort or uniq to alleviate this problem. At what step in the code for processPhraseTable would this error be generated? -Megan ----- Original Message ----- From: Philipp Koehn <[EMAIL PROTECTED]> Date: Monday, July 14, 2008 1:46 am Subject: Re: [Moses-support] phrase table memory issue To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]> Cc: [email protected] > Hi, > > are you sorting the phrase table? > Check the command as described on the Moses web site. > > -phi > > On Wed, Jul 9, 2008 at 8:21 PM, Megan Elmore ([EMAIL PROTECTED]) > <[EMAIL PROTECTED]> wrote: > > Hello, > > > > Thanks very much for your quick reply. I am currently trying to > generate a binary phrase table but am getting an error: > > > > ERROR: xsource phrase already inserted (B)! > > line(17): '000 - ||| 000 ? ||| (0) (1) ||| (0) (1) ||| 0.5 > 0.540651 0.25 0.178456 2.718' > > f: 2 0 2 > > > > Does this indicate a problem with my phrase table or with the > processPhraseTable process? In the event that I need to run the > training process differently - what error or warning messages, if > any, that are generated during the training process would let me > know of any errors in my phrase table? > > > > Currently, the phrase table generated during the training process > was left in a gzip'ped format as phrase-table.0-0.gz - I am not > sure if this is relevant, but maybe the odd naming (as opposed to > just "phrase-table" listed in the online documentation) sheds light > on a step of the training process that did not complete normally > for me? > > > > -Megan > > > > ----- Original Message ----- > > From: Philipp Koehn <[EMAIL PROTECTED]> > > Date: Wednesday, July 9, 2008 2:25 pm > > Subject: Re: [Moses-support] phrase table memory issue > > To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]> > > Cc: [email protected] > > > >> Hi, > >> > >> this is a sign that the phrase table is too big to load into > memory,>> there are three options: > >> (a) use the binary phrase table > >> (b) filter the phrase table for the test set you are using > >> (c) both > >> > >> See the Moses web page for details. > >> > >> -phi > >> > >> On Wed, Jul 9, 2008 at 7:17 PM, Megan Elmore > ([EMAIL PROTECTED])>> <[EMAIL PROTECTED]> wrote: > >> > Hello, > >> > > >> > I have installed Moses and run the training process using the > >> europarl corpus but am now having problems with the decoder loading > >> the phrase table. Like a previous message on this list, I am > >> getting the error > >> > > >> > terminate called after throwing an instance of 'std::bad_alloc' > >> > what(): St9bad_alloc > >> > Aborted > >> > > >> > while the decoder is trying to load the phrase table, regardless > >> of the machine I run the decoder on (I've tried four now). Is there > >> a way I can optimize how much space the phrase table uses? Or is > >> there something that could be going wrong in the training or > >> decoding processes? I am not sure where to look for the error but > >> with a little direction I could keep trying to debug it. > >> > > >> > Thanks, > >> > -Megan E. > >> > _______________________________________________ > >> > Moses-support mailing list > >> > [email protected] > >> > http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > >> > > >> > > > > > ------------------------------ Message: 5 Date: Mon, 14 Jul 2008 16:45:08 +0100 From: "Hieu Hoang" <[EMAIL PROTECTED]> Subject: Re: [Moses-support] [Bulk] Re: phrase table memory issue To: "'Megan Elmore'" <[EMAIL PROTECTED]>, <[email protected]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="iso-8859-1" I get the same problem also. The issue seems to be with obtuse unix sort command. In some versions of sort, it may be sorting by a hash index, rather than alphanumberic sort. Therefore, you need to force it to do an alphanumberic sort sort -t"|" -k1,1 This fixed it for me. It's not the perfect solution, but it'll do for now. Unix - guaranteed to give you a headache Hieu Hoang www.hoang.co.uk/hieu -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Megan Elmore ([EMAIL PROTECTED]) Sent: 14 July 2008 16:17 To: [email protected] Subject: [Bulk] Re: [Moses-support] phrase table memory issue Hello again, Yes, I was using the command as described on the Moses web site at http://www.statmt.org/moses/?n=Moses.AdvancedFeatures. I have also tried piping the results from sort through uniq before piping it into processPhraseTable and encountered the same error. Perhaps I am unaware of some option to provide to sort or uniq to alleviate this problem. At what step in the code for processPhraseTable would this error be generated? -Megan ----- Original Message ----- From: Philipp Koehn <[EMAIL PROTECTED]> Date: Monday, July 14, 2008 1:46 am Subject: Re: [Moses-support] phrase table memory issue To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]> Cc: [email protected] > Hi, > > are you sorting the phrase table? > Check the command as described on the Moses web site. > > -phi > > On Wed, Jul 9, 2008 at 8:21 PM, Megan Elmore ([EMAIL PROTECTED]) > <[EMAIL PROTECTED]> wrote: > > Hello, > > > > Thanks very much for your quick reply. I am currently trying to > generate a binary phrase table but am getting an error: > > > > ERROR: xsource phrase already inserted (B)! > > line(17): '000 - ||| 000 ? ||| (0) (1) ||| (0) (1) ||| 0.5 > 0.540651 0.25 0.178456 2.718' > > f: 2 0 2 > > > > Does this indicate a problem with my phrase table or with the > processPhraseTable process? In the event that I need to run the > training process differently - what error or warning messages, if any, > that are generated during the training process would let me know of > any errors in my phrase table? > > > > Currently, the phrase table generated during the training process > was left in a gzip'ped format as phrase-table.0-0.gz - I am not sure > if this is relevant, but maybe the odd naming (as opposed to just > "phrase-table" listed in the online documentation) sheds light on a > step of the training process that did not complete normally for me? > > > > -Megan > > > > ----- Original Message ----- > > From: Philipp Koehn <[EMAIL PROTECTED]> > > Date: Wednesday, July 9, 2008 2:25 pm > > Subject: Re: [Moses-support] phrase table memory issue > > To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]> > > Cc: [email protected] > > > >> Hi, > >> > >> this is a sign that the phrase table is too big to load into > memory,>> there are three options: > >> (a) use the binary phrase table > >> (b) filter the phrase table for the test set you are using > >> (c) both > >> > >> See the Moses web page for details. > >> > >> -phi > >> > >> On Wed, Jul 9, 2008 at 7:17 PM, Megan Elmore > ([EMAIL PROTECTED])>> <[EMAIL PROTECTED]> wrote: > >> > Hello, > >> > > >> > I have installed Moses and run the training process using the > >> europarl corpus but am now having problems with the decoder loading > >> the phrase table. Like a previous message on this list, I am > >> getting the error > >> > > >> > terminate called after throwing an instance of 'std::bad_alloc' > >> > what(): St9bad_alloc > >> > Aborted > >> > > >> > while the decoder is trying to load the phrase table, regardless > >> of the machine I run the decoder on (I've tried four now). Is there > >> a way I can optimize how much space the phrase table uses? Or is > >> there something that could be going wrong in the training or > >> decoding processes? I am not sure where to look for the error but > >> with a little direction I could keep trying to debug it. > >> > > >> > Thanks, > >> > -Megan E. > >> > _______________________________________________ > >> > Moses-support mailing list > >> > [email protected] > >> > http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > >> > > >> > > > > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support ------------------------------ _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support End of Moses-support Digest, Vol 21, Issue 8 ******************************************** _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
