you have to escape these characters as well |<>

best to use the script
   scripts/tokenizer/escape-special-chars.perl
the size of the data doesn't matter, the pre-made models that comes with
RELEASE-1.0 are trained on 1m+ sentences.

But as Ken says, make sure some intermediate steps in the training didn't
crash because you ran out of disk space


On 24 April 2013 03:15, Wenliang Chen <[email protected]> wrote:

> Hi,
>
> I removed the characters [] from the training data, but still met the same
> error. I tried using small sizes (1/5) of training data that contain [],
> and the system worked well.
>
> Wen
>
>
> On Tue, Apr 23, 2013 at 4:41 PM, Kenneth Heafield <[email protected]>wrote:
>
>> Your training data should not contain the characters [ or ].  If you're
>> using the Moses tokenizer, these would have been escaped for you.
>> Otherwise, you're on your own to escape them.
>>
>> Kenneth
>>
>>
>> On 04/23/13 09:05, Wenliang Chen wrote:
>>
>>> Hi, Kenneth
>>>
>>> Thanks for your information.
>>>  > The problem is that you have a token of the form [foo] when it's
>>> expecting the token to be [foo][bar] for labels on both sides.
>>> Do you know how to fix it? Now i have about 500K sentence pairs. If I
>>> reduce the size to 100K sentence pairs, the system for tuning runs well.
>>> And I am sure the machine has enough disk and memory spaces.
>>>
>>> Wen
>>>
>>> On Tue, Apr 23, 2013 at 3:38 PM, Kenneth Heafield <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>     I've added error messages to master.  Note that every CHECK is a TODO
>>>     for replacement with an actual error message.
>>>
>>>     The problem is that you have a token of the form [foo] when it's
>>>     expecting the token to be [foo][bar] for labels on both sides.
>>>
>>>     Kenneth
>>>
>>>     On 04/23/13 05:52, Wenliang Chen wrote:
>>>      > Hi, All
>>>      >
>>>      > I used moses-1.0 and ran moses-chart.
>>>      >
>>>      > The training stage was finished successfully, but met an error
>>> when
>>>      > start the tuning.
>>>      >
>>>      > The information is as follows:
>>>      > ...
>>>      > Start loading text SCFG phrase table. Moses  format : [0.879]
>>> seconds
>>>      > Reading ./hiero-tune/filtered/phrase-**table.0-0.1.1.gz
>>>      >
>>>     ----5---10---15---20---25---**30---35---40---45---50---55---**
>>> 60---65---70---75---80---85---**90---95--100
>>>      > ********************************Check nextPos != string::npos
>>> failed in
>>>      > moses/Phrase.cpp:202
>>>      > Aborted (core dumped)
>>>      > Exit code: 134
>>>      > Failed to run moses with the config filtered/moses.ini at
>>>      > ../mosesdecoder-RELEASE-1.0/**scripts/training/mert-moses-**
>>> chart.pl <http://mert-moses-chart.pl>
>>>     <http://mert-moses-chart.pl>
>>>      > <http://mert-moses-chart.pl> line 1169.
>>>      > ...
>>>      >
>>>      >
>>>      > Best
>>>      >
>>>      > Wen
>>>      >
>>>      >
>>>      >
>>>      > ______________________________**_________________
>>>      > Moses-support mailing list
>>>      > [email protected] <mailto:[email protected]>
>>>
>>>      > 
>>> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support>
>>>      >
>>>     ______________________________**_________________
>>>     Moses-support mailing list
>>>     [email protected] <mailto:[email protected]>
>>>     
>>> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support>
>>>
>>>
>>>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to