Re: [Moses-support] phrase extraction step

Philipp Koehn Sun, 31 Mar 2013 09:35:08 -0700

Hi,

since something goes wrong in the phrase extraction step, please try to run
the commands by hand and check where something fails. The commands are
reported in STDERR of the step.


In your case:

/home/nikhila/project/mosesdecoder/scripts/generic/score-parallel.perl 1
"sort    " /home/nikhila/project/mosesdecoder/scripts/../bin/score
./model/extract.sorted.gz ./model/lex.f2e ./model/phrase-table.half.f2e.gz
0

ln -s ./model/extract.sorted.gz ./model/tmp.7551/extract.0.gz

/home/nikhila/project/mosesdecoder/scripts/../bin/score
./model/tmp.7551/extract.0.gz ./model/lex.f2e
./model/tmp.7551/phrase-table.half.00000.gz

./model/tmp.7551/run.0.shmv ./model/tmp.7551/phrase-table.half.00000.gz

>From looking at this, my guess is that there is problem with
not specifying full paths, but rather "." as root directory.

-phi


On Fri, Mar 29, 2013 at 7:25 AM, Nikhila Achukatla <
[email protected]> wrote:

> Hi,
>
> yes, alignment file is correctly generated.
> No, my data doesn't contain any special characters.
> I ran each step in isolation and I attached them.
> Please check them once.
> In fifth step itself, extract files are not generated.
> I cleaned the data before proceeding.
> And I am working on Telugu(Indian language).
> Will Moses support those languages?
>
> And also, I executed with the data provided by Moses website.
> With that data also same problem occurred.
> phrase-table.gz,extract.sorted.gz,extract.inv.gz files are just empty.
> extract.o.sorted.gz file is not at all created.
>
> Do it requires any extra softwares to be installed??
>
>
> On 28 March 2013 09:30, Philipp Koehn <[email protected]> wrote:
>
>> Hi,
>>
>> > I'm hereby attaching a file. I got it when executed 5th step.
>> > I don't why phrase table,extract.sorted.gz etc. files are not extracted.
>> > please help me.
>>
>> How do the input files to the extract step look like. Is the
>> word alignment file correct and has the same number of
>> lines as the others?
>>
>> Do you have any forbidden characters (especially "|") in your
>> data that may cause problems?
>>
>> You can run each step in isolation by running the train-model.perl
>> with specifying the --first-step and --last-step switches.
>> The numbers of the steps are listed here:
>> http://www.statmt.org/moses/?n=FactoredTraining.HomePage
>>
>> A common mistake is to forget to clean the parallel corpus
>> (throw out long sentences or length-mismatched sentence pairs)
>> which causes faulty word alignment which then causes
>> phrase extraction to fail.
>>
>> > And also I want to know about tokenization step.
>> > In tokenization step, rather than dividing a sentence into tokens, will
>> any
>> > extra
>> > processing is done?
>>
>> A typical additional step is lowercasing or truecasing, which
>> normalizes words that occur at the beginning at the sentence ("The")
>> or in all caps ("THE") to a common form ("the").
>>
>> -phi
>>
>> On Thu, Mar 28, 2013 at 6:14 AM, Nikhila Achukatla
>> <[email protected]> wrote:
>> > Hi,
>> >
>>
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > [email protected]
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>>
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] phrase extraction step

Reply via email to