Re: [Moses-support] binary phrase table issue

Wilson, Kevin Mon, 14 Jul 2008 12:33:58 -0700

Hi Megan,

I've also had this problem in the past. In my case it was fixed by
typing "export LC_ALL=C" prior to running the processPhraseTable
command. I hope that helps.


Kevin.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Monday, July 14, 2008 11:46 AM
To: [email protected]
Subject: Moses-support Digest, Vol 21, Issue 8

Send Moses-support mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
        [EMAIL PROTECTED]

You can reach the person managing the list at
        [EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

   1. Re: OT: LDC2004E12 (Ham, Michael)
   2. Re: phrase table memory issue (Philipp Koehn)
   3. Re: Re : [getting started] help (Philipp Koehn)
   4. Re: phrase table memory issue
      (Megan Elmore ([EMAIL PROTECTED]))
   5. Re: [Bulk] Re:  phrase table memory issue (Hieu Hoang)


----------------------------------------------------------------------

Message: 1
Date: Sun, 13 Jul 2008 22:08:14 -0400
From: "Ham, Michael" <[EMAIL PROTECTED]>
Subject: Re: [Moses-support] OT: LDC2004E12
To: <[email protected]>
Message-ID:
        
<[EMAIL PROTECTED]>
Content-Type: text/plain;       charset="us-ascii"

Those escape numbers are Unicode characters.  The Chinese character set
does not exist in ASCII, so you have to use UTF-8.

However, in addition to doing this, you also need to install a font that
can show Chinese characters.  One that I have gotten to work that you
may want to look into is the Bitstream Cyberbit font.  You can download
it here:
http://http.netscape.com.edgesuite.net/pub/communicator/extras/fonts/win
dows/Cyberbit.ZIP 

I hope this helps!
- Michael

------------------------------

Date: Fri, 11 Jul 2008 15:39:11 -0400
From: "John D. Burger" <[EMAIL PROTECTED]>
Subject: [Moses-support] OT: LDC2004E12
To: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

Sorry for the slightly off-topic message, but at least it's about MT:

We're using the UN Chinese-English Parallel Text collection  
(LDC2004E12) for some of our work.  It has lots of odd sequences of  
the form:

   \x{a37e}

I presume these are hex codes indicating escaped characters or  
something, but I'm not sure what.  Has anyone done anything with  
these, other than ignore or delete them?

Thanks.

- John Burger
   MITRE


------------------------------

Message: 2
Date: Sat, 12 Jul 2008 10:16:21 +0000 (UTC)
From: Vineet Kashyap <[EMAIL PROTECTED]>
Subject: [Moses-support] Unknown words
To: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=us-ascii

Hi all

1. is there a way to output unknown words to a separate
file instead of dropping them as i think we can add 
those words to the dictionary  which will improve the 
accuracy ?

2. also, when adding dictionary to the parallel corpus as 
suggested by Phillip in the previous post you have one
word in the source language and the other in the target 
language is that correct?

3. Does BLEU uses a reference file with accurate human 
translations to estimate a score ? And if not would it
be better to evaluate the system with such a reference file 
with accurate translations ? 

4. what value of BLEU means good translations ? in percentage...
   and for comparison purposes how would a human judge a MT system's
   performance ?

5. can we train higher order language models with SRILM with
a small corpus or have to use IRSTLM ?


Thanks a lot in advance for taking the time in answering these
questions.

Regards, Vineet



------------------------------

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 21, Issue 7
********************************************



------------------------------

Message: 2
Date: Mon, 14 Jul 2008 06:46:47 +0100
From: "Philipp Koehn" <[EMAIL PROTECTED]>
Subject: Re: [Moses-support] phrase table memory issue
To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]>
Cc: [email protected]
Message-ID:
        <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

are you sorting the phrase table?
Check the command as described on the Moses web site.

-phi

On Wed, Jul 9, 2008 at 8:21 PM, Megan Elmore ([EMAIL PROTECTED])
<[EMAIL PROTECTED]> wrote:
> Hello,
>
> Thanks very much for your quick reply. I am currently trying to
generate a binary phrase table but am getting an error:
>
> ERROR: xsource phrase already inserted (B)!
> line(17): '000 - ||| 000 ? ||| (0) (1) ||| (0) (1) ||| 0.5 0.540651
0.25 0.178456 2.718'
> f: 2 0 2
>
> Does this indicate a problem with my phrase table or with the
processPhraseTable process? In the event that I need to run the training
process differently - what error or warning messages, if any, that are
generated during the training process would let me know of any errors in
my phrase table?
>
> Currently, the phrase table generated during the training process was
left in a gzip'ped format as phrase-table.0-0.gz - I am not sure if this
is relevant, but maybe the odd naming (as opposed to just "phrase-table"
listed in the online documentation) sheds light on a step of the
training process that did not complete normally for me?
>
> -Megan
>
> ----- Original Message -----
> From: Philipp Koehn <[EMAIL PROTECTED]>
> Date: Wednesday, July 9, 2008 2:25 pm
> Subject: Re: [Moses-support] phrase table memory issue
> To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]>
> Cc: [email protected]
>
>> Hi,
>>
>> this is a sign that the phrase table is too big to load into memory,
>> there are three options:
>> (a) use the binary phrase table
>> (b) filter the phrase table for the test set you are using
>> (c) both
>>
>> See the Moses web page for details.
>>
>> -phi
>>
>> On Wed, Jul 9, 2008 at 7:17 PM, Megan Elmore ([EMAIL PROTECTED])
>> <[EMAIL PROTECTED]> wrote:
>> > Hello,
>> >
>> > I have installed Moses and run the training process using the
>> europarl corpus but am now having problems with the decoder loading
>> the phrase table. Like a previous message on this list, I am
>> getting the error
>> >
>> > terminate called after throwing an instance of 'std::bad_alloc'
>> >  what():  St9bad_alloc
>> > Aborted
>> >
>> > while the decoder is trying to load the phrase table, regardless
>> of the machine I run the decoder on (I've tried four now). Is there
>> a way I can optimize how much space the phrase table uses? Or is
>> there something that could be going wrong in the training or
>> decoding processes? I am not sure where to look for the error but
>> with a little direction I could keep trying to debug it.
>> >
>> > Thanks,
>> > -Megan E.
>> > _______________________________________________
>> > Moses-support mailing list
>> > [email protected]
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>>
>
>



------------------------------

Message: 3
Date: Mon, 14 Jul 2008 07:14:38 +0100
From: "Philipp Koehn" <[EMAIL PROTECTED]>
Subject: Re: [Moses-support] Re : [getting started] help
To: "Pham Thi Anh Vi" <[EMAIL PROTECTED]>
Cc: [email protected]
Message-ID:
        <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=UTF-8

Hi,

your compile with the irstlm did not work correctly,
otherwise it would recognize the language model
option.

-phi

On Mon, Jul 7, 2008 at 2:17 AM, Pham Thi Anh Vi <[EMAIL PROTECTED]>
wrote:
> Hi Mailing list,
>
> On Tue, Apr 22, 2008 at 9:31 PM, sushil ronghe <[EMAIL PROTECTED]>
>
>> wrote:
>> > Dear all,
>> >
>> > I am trying to compiled moses with irstlm library in path.
>> > the compilation has not given any error so i thought it is done.
>> >
>> > then while testing with the sample model
>
>> > http://www.statmt.org/moses/download/sample-models.tgz
>> >  i got this messages
>> > -------------------------
>
>> > Defined parameters (per moses.ini or switch):
>> >         config: moses.ini
>> >         input-factors: 0
>> >         lmodel-file: 0 0 3 ../lm/europarl.srilm.gz
>> >         mapping: T 0
>
>> >          ttable-file: 0 0 1 phrase-table
>> >         ttable-limit: 10
>> >         weight-d: 1
>> >         weight-l: 1
>> >         weight-t: 1
>> >         weight-w: 0
>
>> > Loading lexical distortion models...
>> > have 0 models
>> >  Start loading LanguageModel ../lm/europarl.srilm.gz : [0.000]
seconds
>> > ERROR:Language model type unknown. Probably not compiled into
library
>
>> > ERROR:no LM created. We probably don't have it compiled
>> >
>> > I am unable to understand what this error message is suggesting.
>> >
>> > I have installed the moses and irstlm on  i686  with OS ubuntu.
>
>> > the compilation has not given any error.
>> >
>> > please help me to figure out what is going wrong.
>> >
>> >
>> > Thanks
>
> I have the same Error. I supplied the following setting for the
language
> model switch as Emmanuel:
>
>> lmodel-file: 1 0 5 ../lm/europarl.srilm.blm, but the error still
availble.
>> Here is my moses config :
>
> #########################
> ### MOSES CONFIG FILE ###
> #########################
> # input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> # translation tables: source-factors, target-factors, number of
scores, file
>
> [ttable-file]
> 1 0 5
>
/home/zil/Working/Language_models/Rbtdfinal_280k/model/binary_phrasetabl
e/phrase-table.0-0.gz
>
> # no generation models, no generation-file section
>
> # language models: type(srilm/irstlm), factors, order, file
>
> [lmodel-file]
> 1 0 5
>
/hoe/zil/Working/Language_models/Rbtdfinal_280k/binary_lm/GwtwVnthuquan.
blm
>
>
> # limit on how many phrase translations e for each phrase f are loaded
> # 0 = all elements loaded
> [ttable-limit]
>
> 20
> 0
> # distortion (reordering) files
> [distortion-file]
> 0-0 msd-bidirectional-fe 6
>
/home/zil/Working/Language_models/Rbtdfinal_280k/model/binary_reordering
/reordering-table.msd-bidirectional-fe.0.5.0-0.gz
>
>
> # distortion (reordering) weight
> [weight-d]
> 0.3
> 0.3
> 0.3
> 0.3
> 0.3
> 0.3
> 0.3
>
> # language model weights
> [weight-l]
> 0.5000
>
>
> # translation model weights
> [weight-t]
> 0.2
> 0.2
>
> 0.2
> 0.2
> 0.2
>
> # no generation models, no weight-generation section
>
> # word penalty
> [weight-w]
> -1
>
> [distortion-limit]
> 6
>
> Please help me to figure out what is going wrong. --
> =============================
> Ph?m Th? ?nh Vi
> VIEGRID JSC Hu?
> Mobile phone : 0984693313
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



------------------------------

Message: 4
Date: Mon, 14 Jul 2008 11:17:17 -0400
From: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]>
Subject: Re: [Moses-support] phrase table memory issue
To: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=iso-8859-1

Hello again,

Yes, I was using the command as described on the Moses web site at
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures. I have also tried
piping the results from sort through uniq before piping it into
processPhraseTable and encountered the same error. Perhaps I am unaware
of some option to provide to sort or uniq to alleviate this problem. At
what step in the code for processPhraseTable would this error be
generated?

-Megan

----- Original Message -----
From: Philipp Koehn <[EMAIL PROTECTED]>
Date: Monday, July 14, 2008 1:46 am
Subject: Re: [Moses-support] phrase table memory issue
To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]>
Cc: [email protected]

> Hi,
> 
> are you sorting the phrase table?
> Check the command as described on the Moses web site.
> 
> -phi
> 
> On Wed, Jul 9, 2008 at 8:21 PM, Megan Elmore ([EMAIL PROTECTED])
> <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > Thanks very much for your quick reply. I am currently trying to 
> generate a binary phrase table but am getting an error:
> >
> > ERROR: xsource phrase already inserted (B)!
> > line(17): '000 - ||| 000 ? ||| (0) (1) ||| (0) (1) ||| 0.5 
> 0.540651 0.25 0.178456 2.718'
> > f: 2 0 2
> >
> > Does this indicate a problem with my phrase table or with the 
> processPhraseTable process? In the event that I need to run the 
> training process differently - what error or warning messages, if 
> any, that are generated during the training process would let me 
> know of any errors in my phrase table?
> >
> > Currently, the phrase table generated during the training process 
> was left in a gzip'ped format as phrase-table.0-0.gz - I am not 
> sure if this is relevant, but maybe the odd naming (as opposed to 
> just "phrase-table" listed in the online documentation) sheds light 
> on a step of the training process that did not complete normally 
> for me?
> >
> > -Megan
> >
> > ----- Original Message -----
> > From: Philipp Koehn <[EMAIL PROTECTED]>
> > Date: Wednesday, July 9, 2008 2:25 pm
> > Subject: Re: [Moses-support] phrase table memory issue
> > To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]>
> > Cc: [email protected]
> >
> >> Hi,
> >>
> >> this is a sign that the phrase table is too big to load into 
> memory,>> there are three options:
> >> (a) use the binary phrase table
> >> (b) filter the phrase table for the test set you are using
> >> (c) both
> >>
> >> See the Moses web page for details.
> >>
> >> -phi
> >>
> >> On Wed, Jul 9, 2008 at 7:17 PM, Megan Elmore 
> ([EMAIL PROTECTED])>> <[EMAIL PROTECTED]> wrote:
> >> > Hello,
> >> >
> >> > I have installed Moses and run the training process using the
> >> europarl corpus but am now having problems with the decoder loading
> >> the phrase table. Like a previous message on this list, I am
> >> getting the error
> >> >
> >> > terminate called after throwing an instance of 'std::bad_alloc'
> >> >  what():  St9bad_alloc
> >> > Aborted
> >> >
> >> > while the decoder is trying to load the phrase table, regardless
> >> of the machine I run the decoder on (I've tried four now). Is there
> >> a way I can optimize how much space the phrase table uses? Or is
> >> there something that could be going wrong in the training or
> >> decoding processes? I am not sure where to look for the error but
> >> with a little direction I could keep trying to debug it.
> >> >
> >> > Thanks,
> >> > -Megan E.
> >> > _______________________________________________
> >> > Moses-support mailing list
> >> > [email protected]
> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >
> >> >
> >>
> >
> >
> 



------------------------------

Message: 5
Date: Mon, 14 Jul 2008 16:45:08 +0100
From: "Hieu Hoang" <[EMAIL PROTECTED]>
Subject: Re: [Moses-support] [Bulk] Re:  phrase table memory issue
To: "'Megan Elmore'" <[EMAIL PROTECTED]>, <[email protected]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain;       charset="iso-8859-1"

I get the same problem also. 

The issue seems to be with obtuse unix sort command. 

In some versions of sort, it may be sorting by a hash index, rather than
alphanumberic sort. Therefore, you need to force it to do an
alphanumberic
sort
         sort  -t"|" -k1,1
This fixed it for me. It's not the perfect solution, but it'll do for
now.

Unix - guaranteed to give you a headache

Hieu Hoang
www.hoang.co.uk/hieu


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
On Behalf Of Megan Elmore ([EMAIL PROTECTED])
Sent: 14 July 2008 16:17
To: [email protected]
Subject: [Bulk] Re: [Moses-support] phrase table memory issue

Hello again,

Yes, I was using the command as described on the Moses web site at
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures. I have also tried
piping the results from sort through uniq before piping it into
processPhraseTable and encountered the same error. Perhaps I am unaware
of
some option to provide to sort or uniq to alleviate this problem. At
what
step in the code for processPhraseTable would this error be generated?

-Megan

----- Original Message -----
From: Philipp Koehn <[EMAIL PROTECTED]>
Date: Monday, July 14, 2008 1:46 am
Subject: Re: [Moses-support] phrase table memory issue
To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]>
Cc: [email protected]

> Hi,
> 
> are you sorting the phrase table?
> Check the command as described on the Moses web site.
> 
> -phi
> 
> On Wed, Jul 9, 2008 at 8:21 PM, Megan Elmore ([EMAIL PROTECTED]) 
> <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > Thanks very much for your quick reply. I am currently trying to
> generate a binary phrase table but am getting an error:
> >
> > ERROR: xsource phrase already inserted (B)!
> > line(17): '000 - ||| 000 ? ||| (0) (1) ||| (0) (1) ||| 0.5
> 0.540651 0.25 0.178456 2.718'
> > f: 2 0 2
> >
> > Does this indicate a problem with my phrase table or with the
> processPhraseTable process? In the event that I need to run the 
> training process differently - what error or warning messages, if any,

> that are generated during the training process would let me know of 
> any errors in my phrase table?
> >
> > Currently, the phrase table generated during the training process
> was left in a gzip'ped format as phrase-table.0-0.gz - I am not sure 
> if this is relevant, but maybe the odd naming (as opposed to just 
> "phrase-table" listed in the online documentation) sheds light on a 
> step of the training process that did not complete normally for me?
> >
> > -Megan
> >
> > ----- Original Message -----
> > From: Philipp Koehn <[EMAIL PROTECTED]>
> > Date: Wednesday, July 9, 2008 2:25 pm
> > Subject: Re: [Moses-support] phrase table memory issue
> > To: "Megan Elmore ([EMAIL PROTECTED])" <[EMAIL PROTECTED]>
> > Cc: [email protected]
> >
> >> Hi,
> >>
> >> this is a sign that the phrase table is too big to load into
> memory,>> there are three options:
> >> (a) use the binary phrase table
> >> (b) filter the phrase table for the test set you are using
> >> (c) both
> >>
> >> See the Moses web page for details.
> >>
> >> -phi
> >>
> >> On Wed, Jul 9, 2008 at 7:17 PM, Megan Elmore
> ([EMAIL PROTECTED])>> <[EMAIL PROTECTED]> wrote:
> >> > Hello,
> >> >
> >> > I have installed Moses and run the training process using the
> >> europarl corpus but am now having problems with the decoder loading

> >> the phrase table. Like a previous message on this list, I am 
> >> getting the error
> >> >
> >> > terminate called after throwing an instance of 'std::bad_alloc'
> >> >  what():  St9bad_alloc
> >> > Aborted
> >> >
> >> > while the decoder is trying to load the phrase table, regardless
> >> of the machine I run the decoder on (I've tried four now). Is there

> >> a way I can optimize how much space the phrase table uses? Or is 
> >> there something that could be going wrong in the training or 
> >> decoding processes? I am not sure where to look for the error but 
> >> with a little direction I could keep trying to debug it.
> >> >
> >> > Thanks,
> >> > -Megan E.
> >> > _______________________________________________
> >> > Moses-support mailing list
> >> > [email protected]
> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >
> >> >
> >>
> >
> >
> 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support




------------------------------

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 21, Issue 8
********************************************

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] binary phrase table issue

Reply via email to