Hi, is the idea to replace certain parts of the text with tokens such as DATE and then align the rest of the sentence? I'd suggest to just reformat the training data, make sure that matching tokens are added to each sentence pair, and for good measure add 1000 sentences pairs that only contain DATE for input and output language.
-phi On Thu, Feb 26, 2009 at 1:38 AM, James Read <[email protected]> wrote: > Consider the following sentence pair. > > I declare resumed the session of the European Parliament adjourned on Friday > 17 December 1999 > > Ich erkläre die am Freitag, dem 17. Dezember unterbrochene Sitzungsperiode > des Europäischen Parlaments für wiederaufgenommen > > This sentence can be reduced to the following templates: > > I declare resumed the session of the European Parliament adjourned on ___ > > Ich erkläre die am ___ unterbrochene Sitzungsperiode des Europäischen > Parlaments für wiederaufgenommen > > Given a set of candidate tokens for such template could the current > implementation of Giza++ figure out which template pairs align or do you > think the code would need serious modifications? > > I hope this made my question clearer. > > > Quoting Philipp Koehn <[email protected]>: > >> Hi, >> >> not sure, what you are asking for - are you looking for phrasal >> alignments, in other words frequent occurrences of the example >> you mention? This is done by the phrase extraction scripts. >> >> -phi >> >> On Wed, Feb 25, 2009 at 1:04 PM, James Read <[email protected]> wrote: >>> >>> Hi, >>> >>> thanks to everybody for responses to my query about parallelising >>> Giza++. All the responses were very useful and have helped the project >>> make quick progress. >>> >>> The greater intention is to use Giza++ to automatically find template >>> translation pairs >>> >>> e.g. >>> >>> English - My name is x >>> Italian - Mi chiamo x >>> >>> Does anybody have any ideas about how adaptable Giza++ is in its >>> current state to learning such pairs? Would it be a simple case of >>> presenting Giza++ with candidate tokens to align? Or would >>> modifications to the EM algorithms be necessary to accomplish this? >>> >>> Thanks in advance for any suggestions. >>> >>> James >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >> > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
