Hi all I was wondering if we can write a script that automatically takes the necessary steps for training. Any pointers on this would be appreciated.
Also i would like to know with any corpus do symbols like full stop, question mark, comma, opostrophe etc. play a significant role. I mean can these be included in the corpus and also why do we have to lowercase everything. Another thing is that i know that the size of the corpus should be as big as possible but there should be a threshold. This exponential increase should stop somewhere where increasing the size wont improve the accuracy or will it coming improving ? Thanks in advance. Regards, Vineet _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
