Hi,

another and maybe cleaner fix is to run TreeTagger and all your 
preprocessing steps on the input text as well, not just the training data.

O.

Philipp Koehn wrote:
> Hi,
> 
> yes that is true - it requires that your input is segmented in the same
> way. This may or may not be a problem. One work-around would be
> to use the lattice format to provide multiple segmentations.
> 
> -phi
> 
> On Thu, Feb 12, 2009 at 1:48 PM, Michael Zuckerman
> <[email protected]> wrote:
>> Thank you for your answer. However I still don't understand something. If
>> there is such a phrase in the input to translate, then moses will not know
>> that it is equal to the phrase with tildes.
>>
>> Michael.
>>
>> On Thu, Feb 12, 2009 at 3:28 PM, Philipp Koehn <[email protected]> wrote:
>>> Hi,
>>>
>>> one thing you can do here is to change the tokenization scheme based
>>> on the treetagger output, i.e. make a~la~derecha one word (using the
>>> tildes, for instance to glue the parts together).
>>>
>>> -phi
>>>
>>> On Thu, Feb 12, 2009 at 1:10 PM, Michael Zuckerman
>>> <[email protected]> wrote:
>>>> Hello,
>>>>
>>>> We are trying to run factored training on spanish corpus. We first tag
>>>> the
>>>> corpus with TreeTagger, change the format to "<word>|<lemma>|<tag>
>>>> <word>|<lemma>|<tag> ...", and then run the script
>>>> train-factored-phrase-model.perl on it. The problem arises when there
>>>> are
>>>> phrases which are treated by TreeTagger as one word, for example
>>>> "a la derecha|a~la~derecha|adv". Then train-factored-phrase-model.perl
>>>> says
>>>> that no factor was found for the word "a" and for the word "la" in the
>>>> file.
>>>> Is there a way to tell the script that "a la derecha" should be treated
>>>> as
>>>> one word ?
>>>>
>>>> Thanks,
>>>>      Michael.
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 

-- 
Ondrej Bojar (mailto:[email protected])
http://www.cuni.cz/~obo
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to