Dear Shaimaa, I don't understand which files are supposed to form the word-aligned sentence-parallel training corpus. I expected three files with the exact same number of lines, but the alignment file has only 4 lines while the corpora have 29 lines:
4 aligned.grow-diag-final
29 car-ready2016-2.de
29 car-ready2016-2.en
45 phrase-table.gz
37 verbose.docx
I'm attaching my script to visually present alignments like this:
this █ - - - -
car - █ - - -
was - - █ - -
stolen - - - █ -
. - - - - █
dieses .
auto
wurde
gestohlen
To get the output, use:
paste car-ready2016-2.en car-ready2016-2.de aligned.grow-diag-final |
alitextview.pl | less
(I might have swapped the languages.)
If your training corpus consists now of the files: car-ready2016-2.en
car-ready2016-2.de aligned.grow-diag-final, then obviously, no "ich verkaufe"
can be translated, it's never aligned to anything in the training data.
Cheers, O.
----- Original Message -----
> From: "Shaimaa Marzouk" <[email protected]>
> To: [email protected], "Ondrej Bojar" <[email protected]>
> Sent: Wednesday, 6 January, 2016 18:27:02
> Subject: Re: [Moses-support] Factored instead of Phrase-based Model?
> Dear Ondrej & Moses-Team,
>
> @Ondrej: thanks a lot for your quick feedback.
>
> The phrase "ich habe" does not appear in the phrase table. The word alignment
> file includes only the first 4 sentences of the training data.
>
> I have separated the sentence (ich habe das auto verkauf) in a separate "in"
> file, but got the same result. I also tried another sentence (ich verkaufe das
> auto), also here "ich verkaufe" can not be translated. I repeated the exact
> sentence (ich verkaufe das auto) many times in the training data and still get
> the same result.
> I attach the word alignment, phrase table, training data and verbose result..
> and would be very grateful to receive any tip.
>
> I would also highly appreciate, if you could let me know, where can I find
> information about
> 1. how to prepare the training data with additional factors, before training
> the Factored Model?
> 2. how to train the Language Model that considers the POS?
>
> I think that sooner or later, the sentences will get complexer and I would
> need
> to work with a Factored Model.
>
>
> Many Thanks
> Shaimaa
>
>
>
>
>
>
> --------------------------------------------
> Ondrej Bojar <[email protected]> schrieb am Mi, 6.1.2016:
>
> Betreff: Re: [Moses-support] Factored instead of Phrase-based Model?
> An: "Shaimaa Marzouk" <[email protected]>, "Shaimaa Marzouk"
> <[email protected]>, [email protected]
> Datum: Mittwoch, 6. Januar, 2016 08:42 Uhr
>
> Dear Shaimaa,
>
> Adding factors can only
> increase any out-of-vocabulary issues.
>
> Use -v (perhaps even a higher verbosity level)
> in moses to see what all translation options are considered
> for the problematic sentence. There could be some
> unfortunate weight settings that for some reason prefer
> identity translation. (The identity translation must however
> appear in the data, or the source word must not appear in
> the data, otherwise Moses would not produce identity
> translation at all.)
>
> And
> then go back to the phrase table and manually search for the
> lines that are supposed to cover the missing words. Here you
> may find the identity entries.
>
> And then go back and check the word alignment
> this (test) sentence got in the training data. There are
> most likely some issues with the alignment that prevented
> proper translations to be extracted.
>
> Best, Ondrej.
>
>
> On January 6, 2016 4:48:26 AM CET, Shaimaa
> Marzouk <[email protected]>
> wrote:
> >Dear Moses-Team,
> >
> >I am trying to
> translate two short sentences included in the same file
> >from German into English using a
> “Phrase-based Model”. The first
> >sentence (das auto wurde verkauft) is
> translated correctly, while the
> >second
> is partly translated:
> >
> >I receive as a result for “ich habe das
> auto verkauft”
> >Ich|UNK|UNK|UNK
> habe|UNK|UNK|UNK the car sold [11111]
> >[total=-203.330] core=(-200.000,
> -5.000, 5.000, 0.000, 0.000, 0.000,
> >0.000, 0.000, -18.660)
> >
> >I tried to modify the
> training data in different ways, and at last
> >included the exact sentence (along with its
> translation) in the
> >training data (see
> attachment). But, I still get the same result.
> >
> >Do I need to use a
> “Factored Translation Model” instead of the
> >“Phrase-based Model” to be able to
> translate this sentence? If yes, I
> >find
> here http://www.statmt.org/moses/?n=Moses.FactoredTutorial
> >explanation of how to train Factored
> Models. Could you please tell me,
> >where
> can I find information about
> >1.
> how to prepare the training data with additional factors,
> before
> >training the Factored Model?
> >2. how to train the Language Model
> that considers the POS?
> >
> >I currently use KenLM and Giza++.
> >
> >Thanks a lot for your
> support.
> >
> >Kind
> regards,
> >Shaimaa
> >
> >------------------------------------------------------------------------
> >
> >_______________________________________________
> >Moses-support mailing list
> >[email protected]
> >http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Ondrej
> Bojar (mailto:[email protected] / [email protected])
> http://www.cuni.cz/~obo
--
Ondrej Bojar (mailto:[email protected] / [email protected])
http://www.cuni.cz/~obo
alitextview.pl
Description: Perl program
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
