Hi Sir,

Thank you for replying to my mail. Yes, I have thought about this solution
for alignments, but the heuristics used in moses got me thinking, and I
wanted to use the heuristic to obtain the final alignments(since the
alignments are of a higher quality). So, my question would be more like, if
I could replace the function of GIZA++ alone(computing the alignments in
both directions) by a customized aligner?

Regarding the next question, when we augment the parallel corpus with
entries from a bilingual dictionary, the alignments are computed over the
entire corpus. Now, the probability that a word in source language s, is
translated to a MWE t1 t2 t3 in the target language needs to be computed.
Initially, GIZA++ would take the event of s being translated  to t1 equally
likely to the event of s being translated to t1 t2 t3. Even after GIZA++
completes its EM iterations, the probabilities of impossible events like s
being translated to t1 alone or t2 alone is not zero because of this. I hope
I am not being too vague about this problem.

The thing is using a dictionary did not improve the quality of alignments
obtained on the same corpora. We worked on the English-Hindi pair using
'tourism' corpus. The size of the dictionary is considerably large and has
about 20,000k entries.


Thank you.

- Regards,
Prasanth

On Mon, Nov 29, 2010 at 9:21 PM, Philipp Koehn <[email protected]> wrote:

> Hi,
>
> > I am familiar with the architecture of Moses, and know that the 2nd and
> 3rd
> > steps involve computing alignments in both directions while the  4th step
> > applies the heuristic(grow,union ...) to obtain the final alignments.
> These
> > alignments are further used to extract the phrase-pairs. Now my question
> is,
> > what would be the best way to incorporate the alignments into Moses.
> > One way would be to duplicate the files generated by GIZA++ in both step
> 2
> > and 3, and start  the training procedure from step:4. However, I was
> > wondering is there was a much simpler method to use the customized
> > alignments in Moses.
>
> If you have your own alignment method, it would be best to skip the
> word alignment steps of the training steps and start with step 4.
> http://www.statmt.org/moses/?n=FactoredTraining.HomePage
>
> > Also in the process of MT, if I wanted to use a bilingual dictionary,
> would
> > it be ideal to use the dictionary in GIZA++ while computing the
> alignments,
> > or to augment the corpus with the entries in the dictionary. Most of the
> > target words for the entries in the dictionary are MWEs, and hence
> > augmenting the corpus did not bring about any improvements when we
> conducted
> > the experiments. Could you kindly suggest an appropriate method to be
> used
> > in this context.
>
> I am not sure what the problem is here - the inclusion of a dictionary as
> additional parallel corpus data is the standard method. I am not entirely
> sure why their translations as MWEs should be a problem.
>
> -phi
>



-- 
"Theories have four stages of acceptance. i) this is worthless nonsense; ii)
this is an interesting, but perverse, point of view, iii) this is true, but
quite unimportant; iv) I always said so."

  --- J.B.S. Haldane
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to