> --factorDelimiter=| There is such a flag. I implemented this about 4 years ago, but AFAIK I'm the only one who ever uses (and I still use it).
-C > > etc. > > Miles > > On 15 November 2010 21:30, Hieu Hoang <[email protected]> wrote: >> That's a good idea. In the decoder, there's 4 places that has to be >> changed cos it's hardcoded >> ConfusionNet >> GenerationDictionary >> LanguageModelJoint >> Word::createFromString >> >> However, the train-model.perl is more difficult to change >> >> Hieu >> Sent from my flying horse >> >> On 15 Nov 2010, at 09:00 PM, Lane Schwartz <[email protected]> wrote: >> >>> I'd like to propose changing the current factor delimiter to something >>> other than the single vertical bar | >>> >>> Looking through the mailing archives, it seems that the failure to properly >>> purge your corpus of vertical bars is a frequent source of headaches for >>> users. I know I've encountered this problem before, but even knowing that I >>> should do this, just today I had to track down another vertical bar-related >>> problem. >>> >>> I don't really care what the replacement character(s) ends up being, just >>> so that any corpus munging related to this delimiter gets handled >>> internally by moses rather than being the user's responsibility. >>> >>> If moses could easily be modified to take a multi-character delimeter, that >>> would probably be best. My suggestion for a single-character delimiter >>> would be something with the following characteristics: >>> >>> * Character should be printable (ie not a control character) >>> * Character should be one that's implemented in most commonly used fonts >>> * Character should be highly obscure, and extremely unlikely to appear in a >>> corpus >>> * Character should not be confusable with any commonly used character. >>> >>> Many characters in the Dingbats section of Unicode (block 2700) would fit >>> these desiderata. >>> >>> I suggest Unicode character 2759, MEDIUM VERTICAL BAR. This is a highly >>> obscure printable character that looks like a thick vertical bar. It's >>> obviously a vertical bar, but just as obviously not the same thing as the >>> regular vertical bar |. >>> >>> Cheers, >>> Lane >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
