i second this.

but can I make another suggestion.  make the default be *non* factored
input.  i reckon that most people using Moses don't actually use
factors (hands-up if you do).
this means, plain input, with absolutely no meta chars in them.

and if you are going to use meta-chars, why not just have a flag such as:

--factorDelimiter=|

etc.

Miles

On 15 November 2010 21:30, Hieu Hoang <[email protected]> wrote:
> That's a good idea. In the decoder, there's 4 places that has to be
> changed cos it's hardcoded
>   ConfusionNet
>    GenerationDictionary
>   LanguageModelJoint
>    Word::createFromString
>
> However, the train-model.perl is more difficult to change
>
> Hieu
> Sent from my flying horse
>
> On 15 Nov 2010, at 09:00 PM, Lane Schwartz <[email protected]> wrote:
>
>> I'd like to propose changing the current factor delimiter to something other 
>> than the single vertical bar |
>>
>> Looking through the mailing archives, it seems that the failure to properly 
>> purge your corpus of vertical bars is a frequent source of headaches for 
>> users. I know I've encountered this problem before, but even knowing that I 
>> should do this, just today I had to track down another vertical bar-related 
>> problem.
>>
>> I don't really care what the replacement character(s) ends up being, just so 
>> that any corpus munging related to this delimiter gets handled internally by 
>> moses rather than being the user's responsibility.
>>
>> If moses could easily be modified to take a multi-character delimeter, that 
>> would probably be best. My suggestion for a single-character delimiter would 
>> be something with the following characteristics:
>>
>> * Character should be printable (ie not a control character)
>> * Character should be one that's implemented in most commonly used fonts
>> * Character should be highly obscure, and extremely unlikely to appear in a 
>> corpus
>> * Character should not be confusable with any commonly used character.
>>
>> Many characters in the Dingbats section of Unicode (block 2700) would fit 
>> these desiderata.
>>
>> I suggest Unicode character 2759, MEDIUM VERTICAL BAR. This is a highly 
>> obscure printable character that looks like a thick vertical bar. It's 
>> obviously a vertical bar, but just as obviously not the same thing as the 
>> regular vertical bar |.
>>
>> Cheers,
>> Lane
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to