El jue, 18-09-2008 a las 02:44 +0800, Nirav escribió:
> Hi,
> 
> Thanks for the reply. Problem is script is not roman for the indian
> regional language......even the punctuation marks are different...
> so how do moses align sentence when it does not know the sentence
> terminator.

Again, iirc, sentences should be separated by line (newline character)

> also moses has a step of lowercasing...there is no concept of
> lowercasing in indian regional language....so how should  i do for it?

Then it works as if it is already lowercased.

Fran

> ---
> Nirav Shah
> 
> On Thu, Sep 18, 2008 at 2:34 AM, Francis Tyers <[EMAIL PROTECTED]>
> wrote:
>         El jue, 18-09-2008 a las 02:30 +0800, Nirav escribió:
>         
>         > Hi,
>         >
>         > I would like to know that how to align the two files one is
>         having
>         > Unicode characters ( Indian regional language) and one is
>         having ascii
>         > text ( English),
>         > also is there any changes needed to train and evaluate the
>         model.
>         
>         
>         It should Just Work™ -- afaik all the tools work with Unicode
>         text,
>         although depending on the regional language in question you
>         might
>         benefit from pre-tokenisation.
>         
>         Fran
>         
>         
> 
> 
> 
> 
> 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to