El jue, 18-09-2008 a las 02:44 +0800, Nirav escribió: > Hi, > > Thanks for the reply. Problem is script is not roman for the indian > regional language......even the punctuation marks are different... > so how do moses align sentence when it does not know the sentence > terminator.
Again, iirc, sentences should be separated by line (newline character) > also moses has a step of lowercasing...there is no concept of > lowercasing in indian regional language....so how should i do for it? Then it works as if it is already lowercased. Fran > --- > Nirav Shah > > On Thu, Sep 18, 2008 at 2:34 AM, Francis Tyers <[EMAIL PROTECTED]> > wrote: > El jue, 18-09-2008 a las 02:30 +0800, Nirav escribió: > > > Hi, > > > > I would like to know that how to align the two files one is > having > > Unicode characters ( Indian regional language) and one is > having ascii > > text ( English), > > also is there any changes needed to train and evaluate the > model. > > > It should Just Work™ -- afaik all the tools work with Unicode > text, > although depending on the regional language in question you > might > benefit from pre-tokenisation. > > Fran > > > > > > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
