Hi,

It seems you are stepping somewhere where nobody really tested it. So the tools 
are not quite ready for factors and explicit sentence boundaries.

You may try using <s>|<s> and </s>|</s> as the sentence boundaries. This, when 
split into the two factors, will get you the correct sentence boudaries for LM 
training. However, if you want to base your LM on both of the factors at once, 
this trick wouldn't work, since IRSTLM won't recognize it as sentence 
boundaries...

I am afraid you have to prepare the files manually, separately for LM creation 
and for grammar extraction (no sentence boundaries there).

Try getting the system trained in a simpler setup, no factors, and if it works, 
mimic it for each of the factors.

Best, O.

On November 8, 2014 10:20:08 AM CET, Marwa Refaie <[email protected]> wrote:
>Hi all,
>Why I still stuck on same error ??!! I cut data & be sure from its
>validity,
>Then I run  c:/irstlm/bin/add-start-end.sh <unH.en > unHse.en
>before creating LM (pos & surface) using srilm. ... I can't apply
>add-start-end to corpus file , as moses show error on training "<s> has
>no second factor" 
>Please help I should resolve this fast 
>Tahanks
>
>  
> 
>Marwa N. Refaie
>
>
>
>> Subject: Re: [Moses-support] Factored LM / <s></s>
>> From: [email protected]
>> Date: Thu, 9 Oct 2014 07:40:28 +0200
>> To: [email protected]; [email protected]
>> 
>> Dear Marwa,
>> 
>> Try cutting the bad data in half and then in half again, etc. to get
>a very small input that still suffers from the error. Then you'll
>probably realize what is the problem or you can at least send it to the
>mailing list.
>> 
>> Cheers, O.
>> 
>> 
>> On October 9, 2014 2:10:12 AM CEST, Marwa Refaie
><[email protected]> wrote:
>> >How I should fix this error ?? Tokenizing didn't differ !! how to
>> >normalize data or set sentence boundaries ???
>> >
>> >Start loading text SCFG phrase table. Moses  format : [1.000]
>> >secondsReading
>>
>>/cygdrive/c/mosesdecoder-master/try/ai/sep/fsmt/work/model/phrase-table.
>>
>>0,1-0,1.gz----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80
>> >---85---90---95--100Either your data contains <s> in a position
>other
>> >than the first word or your la                                      
>  
>> >nguage model is missing <s>.  Did you build your ARPA using IRSTLM
>and
>> >forget to                                                           
>  
>> >                                 run add-start-end.sh? 
>> > 
>> >Marwa N. Refaie
>> >
>> >                                      
>> >
>>
>>------------------------------------------------------------------------
>> >
>> >_______________________________________________
>> >Moses-support mailing list
>> >[email protected]
>> >http://mailman.mit.edu/mailman/listinfo/moses-support
>> 
>> -- 
>> Ondrej Bojar (mailto:[email protected] / [email protected])
>> http://www.cuni.cz/~obo
>> 
>                                         

-- 
Ondrej Bojar (mailto:[email protected] / [email protected])
http://www.cuni.cz/~obo


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to