you have to escape these characters as well |<> best to use the script scripts/tokenizer/escape-special-chars.perl the size of the data doesn't matter, the pre-made models that comes with RELEASE-1.0 are trained on 1m+ sentences.
But as Ken says, make sure some intermediate steps in the training didn't crash because you ran out of disk space On 24 April 2013 03:15, Wenliang Chen <[email protected]> wrote: > Hi, > > I removed the characters [] from the training data, but still met the same > error. I tried using small sizes (1/5) of training data that contain [], > and the system worked well. > > Wen > > > On Tue, Apr 23, 2013 at 4:41 PM, Kenneth Heafield <[email protected]>wrote: > >> Your training data should not contain the characters [ or ]. If you're >> using the Moses tokenizer, these would have been escaped for you. >> Otherwise, you're on your own to escape them. >> >> Kenneth >> >> >> On 04/23/13 09:05, Wenliang Chen wrote: >> >>> Hi, Kenneth >>> >>> Thanks for your information. >>> > The problem is that you have a token of the form [foo] when it's >>> expecting the token to be [foo][bar] for labels on both sides. >>> Do you know how to fix it? Now i have about 500K sentence pairs. If I >>> reduce the size to 100K sentence pairs, the system for tuning runs well. >>> And I am sure the machine has enough disk and memory spaces. >>> >>> Wen >>> >>> On Tue, Apr 23, 2013 at 3:38 PM, Kenneth Heafield <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I've added error messages to master. Note that every CHECK is a TODO >>> for replacement with an actual error message. >>> >>> The problem is that you have a token of the form [foo] when it's >>> expecting the token to be [foo][bar] for labels on both sides. >>> >>> Kenneth >>> >>> On 04/23/13 05:52, Wenliang Chen wrote: >>> > Hi, All >>> > >>> > I used moses-1.0 and ran moses-chart. >>> > >>> > The training stage was finished successfully, but met an error >>> when >>> > start the tuning. >>> > >>> > The information is as follows: >>> > ... >>> > Start loading text SCFG phrase table. Moses format : [0.879] >>> seconds >>> > Reading ./hiero-tune/filtered/phrase-**table.0-0.1.1.gz >>> > >>> ----5---10---15---20---25---**30---35---40---45---50---55---** >>> 60---65---70---75---80---85---**90---95--100 >>> > ********************************Check nextPos != string::npos >>> failed in >>> > moses/Phrase.cpp:202 >>> > Aborted (core dumped) >>> > Exit code: 134 >>> > Failed to run moses with the config filtered/moses.ini at >>> > ../mosesdecoder-RELEASE-1.0/**scripts/training/mert-moses-** >>> chart.pl <http://mert-moses-chart.pl> >>> <http://mert-moses-chart.pl> >>> > <http://mert-moses-chart.pl> line 1169. >>> > ... >>> > >>> > >>> > Best >>> > >>> > Wen >>> > >>> > >>> > >>> > ______________________________**_________________ >>> > Moses-support mailing list >>> > [email protected] <mailto:[email protected]> >>> >>> > >>> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support> >>> > >>> ______________________________**_________________ >>> Moses-support mailing list >>> [email protected] <mailto:[email protected]> >>> >>> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support> >>> >>> >>> > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
