Dear Shaimaa,

I don't understand which files are supposed to form the word-aligned 
sentence-parallel training corpus. I expected three files with the exact same 
number of lines, but the alignment file has only 4 lines while the corpora have 
29 lines:

4       aligned.grow-diag-final
29      car-ready2016-2.de
29      car-ready2016-2.en
45      phrase-table.gz
37      verbose.docx

I'm attaching my script to visually present alignments like this:

  this █ - - - - 
   car - █ - - - 
   was - - █ - - 
stolen - - - █ - 
     . - - - - █ 
       dieses  .
         auto
           wurde
             gestohlen

To get the output, use:

paste car-ready2016-2.en car-ready2016-2.de aligned.grow-diag-final  | 
alitextview.pl | less
(I might have swapped the languages.)

If your training corpus consists now of the files: car-ready2016-2.en 
car-ready2016-2.de aligned.grow-diag-final, then obviously, no "ich verkaufe" 
can be translated, it's never aligned to anything in the training data.

Cheers, O.

----- Original Message -----
> From: "Shaimaa Marzouk" <[email protected]>
> To: [email protected], "Ondrej Bojar" <[email protected]>
> Sent: Wednesday, 6 January, 2016 18:27:02
> Subject: Re: [Moses-support] Factored instead of Phrase-based Model?

> Dear Ondrej & Moses-Team,
> 
> @Ondrej: thanks a lot for your quick feedback.
> 
> The phrase "ich habe" does not appear in the phrase table. The word alignment
> file includes only the first 4 sentences of the training data.
> 
> I have separated the sentence (ich habe das auto verkauf) in a separate "in"
> file, but got the same result. I also tried another sentence (ich verkaufe das
> auto), also here "ich verkaufe" can not be translated. I repeated the exact
> sentence (ich verkaufe das auto) many times in the training data and still get
> the same result.
> I attach the word alignment, phrase table, training data and verbose result..
> and would be very grateful to receive any tip.
> 
> I would also highly appreciate, if you could let me know, where can I find
> information about
> 1.   how to prepare the training data with additional factors, before training
> the Factored Model?
> 2.   how to train the Language Model that considers the POS?
> 
> I think that sooner or later, the sentences will get complexer and I would 
> need
> to work with a Factored Model.
> 
> 
> Many Thanks
> Shaimaa
> 
> 
> 
> 
> 
> 
> --------------------------------------------
> Ondrej Bojar <[email protected]> schrieb am Mi, 6.1.2016:
> 
> Betreff: Re: [Moses-support] Factored instead of Phrase-based Model?
> An: "Shaimaa Marzouk" <[email protected]>, "Shaimaa Marzouk"
> <[email protected]>, [email protected]
> Datum: Mittwoch, 6. Januar, 2016 08:42 Uhr
> 
> Dear Shaimaa,
> 
> Adding factors can only
> increase any out-of-vocabulary issues.
> 
> Use -v (perhaps even a higher verbosity level)
> in moses to see what all translation options are considered
> for the problematic sentence. There could be some
> unfortunate weight settings  that for some reason prefer
> identity translation. (The identity translation must however
> appear in the data, or the source word must not appear in
> the data, otherwise Moses would not produce identity
> translation at all.)
> 
> And
> then go back to the phrase table and manually search for the
> lines that are supposed to cover the missing words. Here you
> may find the identity entries.
> 
> And then go back and check the word alignment
> this (test) sentence got in the training data. There are
> most likely some issues with the alignment that prevented
> proper translations to be extracted.
> 
> Best, Ondrej.
>  
> 
> On January 6, 2016 4:48:26 AM CET, Shaimaa
> Marzouk <[email protected]>
> wrote:
> >Dear Moses-Team,
> >
> >I am trying to
> translate two short sentences included in the same file
> >from German into English using a
> “Phrase-based Model”. The first
> >sentence (das auto wurde verkauft) is
> translated correctly, while the
> >second
> is partly translated:
> >
> >I receive as a result for “ich habe das
> auto verkauft”
> >Ich|UNK|UNK|UNK
> habe|UNK|UNK|UNK the car sold  [11111]
> >[total=-203.330]   core=(-200.000,
> -5.000, 5.000, 0.000, 0.000, 0.000,
> >0.000, 0.000, -18.660)
> >
> >I tried to modify the
> training data in different ways, and at last
> >included the exact sentence (along with its
> translation) in the
> >training data (see
> attachment). But, I still get the same result.
> >
> >Do I need to use a
> “Factored Translation Model” instead of the
> >“Phrase-based Model” to be able to
> translate this sentence? If yes, I
> >find
> here http://www.statmt.org/moses/?n=Moses.FactoredTutorial
> >explanation of how to train Factored
> Models. Could you please tell me,
> >where
> can I find information about
> >1.
> how to prepare the training data with additional factors,
> before
> >training the Factored Model?
> >2.    how to train the Language Model
> that considers the POS?
> >
> >I currently use KenLM and Giza++.
> >
> >Thanks a lot for your
> support.
> >
> >Kind
> regards,
> >Shaimaa
> >
> >------------------------------------------------------------------------
> >
> >_______________________________________________
> >Moses-support mailing list
> >[email protected]
> >http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> --
> Ondrej
> Bojar (mailto:[email protected] / [email protected])
>  http://www.cuni.cz/~obo

-- 
Ondrej Bojar (mailto:[email protected] / [email protected])
http://www.cuni.cz/~obo

Attachment: alitextview.pl
Description: Perl program

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to