Hi, Thanks for the reply. Problem is script is not roman for the indian regional language......even the punctuation marks are different... so how do moses align sentence when it does not know the sentence terminator.
also moses has a step of lowercasing...there is no concept of lowercasing in indian regional language....so how should i do for it? --- Nirav Shah On Thu, Sep 18, 2008 at 2:34 AM, Francis Tyers <[EMAIL PROTECTED]> wrote: > El jue, 18-09-2008 a las 02:30 +0800, Nirav escribió: > > Hi, > > > > I would like to know that how to align the two files one is having > > Unicode characters ( Indian regional language) and one is having ascii > > text ( English), > > also is there any changes needed to train and evaluate the model. > > It should Just Work™ -- afaik all the tools work with Unicode text, > although depending on the regional language in question you might > benefit from pre-tokenisation. > > Fran > > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
