Thanks Hieu. I used scripts/tokenizer/escape-special-chars.perl to preproccess the data and it worked well.
Wen. On Wed, Apr 24, 2013 at 3:27 PM, Hieu Hoang <[email protected]> wrote: > you have to escape these characters as well |<> > > best to use the script > scripts/tokenizer/escape-special-chars.perl > the size of the data doesn't matter, the pre-made models that comes with > RELEASE-1.0 are trained on 1m+ sentences. > > But as Ken says, make sure some intermediate steps in the training didn't > crash because you ran out of disk space > > > On 24 April 2013 03:15, Wenliang Chen <[email protected]> wrote: > >> Hi, >> >> I removed the characters [] from the training data, but still met the >> same error. I tried using small sizes (1/5) of training data that contain >> [], and the system worked well. >> >> Wen >> >> >> On Tue, Apr 23, 2013 at 4:41 PM, Kenneth Heafield <[email protected]>wrote: >> >>> Your training data should not contain the characters [ or ]. If you're >>> using the Moses tokenizer, these would have been escaped for you. >>> Otherwise, you're on your own to escape them. >>> >>> Kenneth >>> >>> >>> On 04/23/13 09:05, Wenliang Chen wrote: >>> >>>> Hi, Kenneth >>>> >>>> Thanks for your information. >>>> > The problem is that you have a token of the form [foo] when it's >>>> expecting the token to be [foo][bar] for labels on both sides. >>>> Do you know how to fix it? Now i have about 500K sentence pairs. If I >>>> reduce the size to 100K sentence pairs, the system for tuning runs well. >>>> And I am sure the machine has enough disk and memory spaces. >>>> >>>> Wen >>>> >>>> On Tue, Apr 23, 2013 at 3:38 PM, Kenneth Heafield <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> I've added error messages to master. Note that every CHECK is a >>>> TODO >>>> for replacement with an actual error message. >>>> >>>> The problem is that you have a token of the form [foo] when it's >>>> expecting the token to be [foo][bar] for labels on both sides. >>>> >>>> Kenneth >>>> >>>> On 04/23/13 05:52, Wenliang Chen wrote: >>>> > Hi, All >>>> > >>>> > I used moses-1.0 and ran moses-chart. >>>> > >>>> > The training stage was finished successfully, but met an error >>>> when >>>> > start the tuning. >>>> > >>>> > The information is as follows: >>>> > ... >>>> > Start loading text SCFG phrase table. Moses format : [0.879] >>>> seconds >>>> > Reading ./hiero-tune/filtered/phrase-**table.0-0.1.1.gz >>>> > >>>> ----5---10---15---20---25---**30---35---40---45---50---55---** >>>> 60---65---70---75---80---85---**90---95--100 >>>> > ********************************Check nextPos != string::npos >>>> failed in >>>> > moses/Phrase.cpp:202 >>>> > Aborted (core dumped) >>>> > Exit code: 134 >>>> > Failed to run moses with the config filtered/moses.ini at >>>> > ../mosesdecoder-RELEASE-1.0/**scripts/training/mert-moses-** >>>> chart.pl <http://mert-moses-chart.pl> >>>> <http://mert-moses-chart.pl> >>>> > <http://mert-moses-chart.pl> line 1169. >>>> > ... >>>> > >>>> > >>>> > Best >>>> > >>>> > Wen >>>> > >>>> > >>>> > >>>> > ______________________________**_________________ >>>> > Moses-support mailing list >>>> > [email protected] <mailto:[email protected]> >>>> >>>> > >>>> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support> >>>> > >>>> ______________________________**_________________ >>>> Moses-support mailing list >>>> [email protected] <mailto:[email protected]> >>>> >>>> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support> >>>> >>>> >>>> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > > -- > Hieu Hoang > Research Associate > University of Edinburgh > http://www.hoang.co.uk/hieu > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
