Hi,

yes that is true - it requires that your input is segmented in the same
way. This may or may not be a problem. One work-around would be
to use the lattice format to provide multiple segmentations.

-phi

On Thu, Feb 12, 2009 at 1:48 PM, Michael Zuckerman
<[email protected]> wrote:
> Thank you for your answer. However I still don't understand something. If
> there is such a phrase in the input to translate, then moses will not know
> that it is equal to the phrase with tildes.
>
> Michael.
>
> On Thu, Feb 12, 2009 at 3:28 PM, Philipp Koehn <[email protected]> wrote:
>>
>> Hi,
>>
>> one thing you can do here is to change the tokenization scheme based
>> on the treetagger output, i.e. make a~la~derecha one word (using the
>> tildes, for instance to glue the parts together).
>>
>> -phi
>>
>> On Thu, Feb 12, 2009 at 1:10 PM, Michael Zuckerman
>> <[email protected]> wrote:
>> > Hello,
>> >
>> > We are trying to run factored training on spanish corpus. We first tag
>> > the
>> > corpus with TreeTagger, change the format to "<word>|<lemma>|<tag>
>> > <word>|<lemma>|<tag> ...", and then run the script
>> > train-factored-phrase-model.perl on it. The problem arises when there
>> > are
>> > phrases which are treated by TreeTagger as one word, for example
>> > "a la derecha|a~la~derecha|adv". Then train-factored-phrase-model.perl
>> > says
>> > that no factor was found for the word "a" and for the word "la" in the
>> > file.
>> > Is there a way to tell the script that "a la derecha" should be treated
>> > as
>> > one word ?
>> >
>> > Thanks,
>> >      Michael.
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > [email protected]
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to