Re: [Moses-support] Phrase extraction

2015-01-19 Thread Hieu Hoang
they are in the file named extract*.gz On 17/01/15 16:26, Cyrine NASRI wrote: Hello, I'am looking if is there any possiblity to access to the file which contains the set of phrases extracted, because I need to do some modifications to these phrase before build translation table Thank you

Re: [Moses-support] Phrase extraction breaks on unexpected format of aligned.grow-diag-final

2014-10-07 Thread Maarten van Gompel
Quoting Philipp Koehn (2014-10-07 04:57:08) Hi, which version of symal are you using? The one distributed with Moses has not changed, but I am aware that Nicola Bertoldi's online mgiza includes a version symal with reported behaviour. You should use the Moses one (in the Moses bin

[Moses-support] Phrase extraction breaks on unexpected format of aligned.grow-diag-final

2014-10-06 Thread Maarten van Gompel
Hi, I'm using the latest git version of moses, and it seems as if the training pipeline got broken somehow as the format of aligned.grow-diag.final changed. I'm invoking model-train.perl as follows: /vol/customopt/machine-translation/src/mosesdecoder/scripts/training/train-model.perl

Re: [Moses-support] Phrase extraction breaks on unexpected format of aligned.grow-diag-final

2014-10-06 Thread Philipp Koehn
Hi, which version of symal are you using? The one distributed with Moses has not changed, but I am aware that Nicola Bertoldi's online mgiza includes a version symal with reported behaviour. You should use the Moses one (in the Moses bin directory). -phi On Mon, Oct 6, 2014 at 4:00 AM,

[Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
Hi, I am using train-model.perl with --extract-options=--IncludeSentenceId and it seems that the sentence id is somehow getting into the phrase table as a count and later used for phrase translation weight calculation, for instance the extract (last column is the Id): #c the compound or

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
I was planning to use it for a custom feature function later. W dniu 23.07.2014 13:11, Hieu Hoang pisze: i can change it so that the sentence id is put into a key-value field in the last column. what is the sentence id used for? is it just for debugging purposes? On 23 July 2014 11:36,

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
Key-value format would actually be fine. W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze: I was planning to use it for a custom feature function later. W dniu 23.07.2014 13:11, Hieu Hoang pisze: i can change it so that the sentence id is put into a key-value field in the last column.

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Philipp Koehn
Hi, the sentence ID is being used for the domain indicator features. If you run phrase-extract's score with specifying a domain file, it then it uses the sentence IDs to find out which domain the phrase pair was found in. This is a standard features in Edinburgh's phrase-based system for the

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Hieu Hoang
ah ok. I thought it was just for debugging. I'm not gonna change it since it's gonna involve months of debugging. Ideally, the extract format should be fixed like the phrase-table, with the last column being key-value pairs. Also, way the key-value pairs are processed should be automatic

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
So, how come this is not damaging the Edinburgh system? W dniu 23.07.2014 17:32, Hieu Hoang pisze: ah ok. I thought it was just for debugging. I'm not gonna change it since it's gonna involve months of debugging. Ideally, the extract format should be fixed like the phrase-table, with the

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Barry Haddow
Because calculating translation probabilities from sentence ids is unexpectedly beneficial? On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote: So, how come this is not damaging the Edinburgh system? W dniu 23.07.2014 17:32, Hieu Hoang pisze: ah ok. I thought it was just for debugging. I'm

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Hieu Hoang
it's likely we're using fractional count so there's a extra column On 23 July 2014 16:34, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote: So, how come this is not damaging the Edinburgh system? W dniu 23.07.2014 17:32, Hieu Hoang pisze: ah ok. I thought it was just for debugging. I'm

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
In a corpus sorted with sentences sorted by release date this could actually make sense :) W dniu 23.07.2014 17:40, Barry Haddow pisze: Because calculating translation probabilities from sentence ids is unexpectedly beneficial? On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote: So, how come

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Philipp Koehn
Hi, this is how extract is called: extract corpus.en corpus.fr align extract 5 --IncludeSentenceId this is how score is called: score extract lex.f2e phrase-table.half --GoodTuring --DomainIndicator domains.5 phrase table looks fine to me -phi On Wed, Jul 23, 2014 at 11:42 AM, Marcin

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
So, adding --IgnoreSentenceId to score might fix that without messing up your stuff? I guess I can do that if you can't be bothered, Hieu. W dniu 23.07.2014 17:53, Philipp Koehn pisze: Hi, this is how extract is called: extract corpus.en corpus.fr http://corpus.fr align extract 5

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Barry Haddow
Hi Marcin It appears that there is an --IgnoreSentenceId argument already, added by Maria during last year's MTM [gna]bhaddow: git blame ScoreFeature.cpp | grep Ignore bff12363 (maria nadejde 2013-09-13 12:45:46 +0200 42) if (args[i] == --IgnoreSentenceId) { cheers - Barry On 23/07/14

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Hieu Hoang
i was doing it it, but mine was a more holistic approach but it would have broken compability. so i can't be bothered On 23 July 2014 16:56, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote: So, adding --IgnoreSentenceId to score might fix that without messing up your stuff? I guess I can

Re: [Moses-support] Phrase extraction with --IncludeSentenceId messes up phrase table counts

2014-07-23 Thread Marcin Junczys-Dowmunt
Oh. Good! I guess there is a lesson to be learned somewhere. Thanks. W dniu 23.07.2014 18:06, Barry Haddow pisze: Hi Marcin It appears that there is an --IgnoreSentenceId argument already, added by Maria during last year's MTM [gna]bhaddow: git blame ScoreFeature.cpp | grep Ignore

Re: [Moses-support] phrase-extraction

2014-03-05 Thread Hieu Hoang
If you mean a tool to create a corpus of aligned sentences, hunalign is suppose to be ok http://mokk.bme.hu/en/resources/hunalign/ If you mean a tool to create translation rules from source and target sentence with word alignments, then the extract and extract-rules programs in moses does

Re: [Moses-support] Phrase extraction was: Re: Choosing the optimal alignment

2013-04-19 Thread Per Tunedal
Hi, Thanks for all reading suggestions. Please, see below. On Thu, Apr 18, 2013, at 0:13, Philipp Koehn wrote: Hi, On Thu, Apr 4, 2013 at 9:04 AM, Per Tunedal per.tune...@operamail.com wrote: Hi, obviously word alignment is very important for extraction of the most useful phrases. What

[Moses-support] Phrase extraction was: Re: Choosing the optimal alignment

2013-04-04 Thread Per Tunedal
Hi, obviously word alignment is very important for extraction of the most useful phrases. What about other ways for phrase extraction? In your text book 'Statistical Machine Translation' you do mention an other approach: we may also use the expectation maximization algorithm to directly find

[Moses-support] phrase extraction step

2013-03-28 Thread Nikhila Achukatla
Hi, I'm hereby attaching a file. I got it when executed 5th step. I don't why phrase table,extract.sorted.gz etc. files are not extracted. please help me. And also I want to know about tokenization step. In tokenization step, rather than dividing a sentence into tokens, will any extra processing

Re: [Moses-support] Phrase Extraction Problem

2013-01-31 Thread Cuong Hoang
I did it, follow to Barry's suggestion. I test on a super small corpus with 2 pairs of sentences and generate 800 bilingual phrases :-D Thanks to you, Barry and Prof. Marcello Federico. On Thu, Jan 31, 2013 at 4:08 AM, Barry Haddow bhad...@staffmail.ed.ac.ukwrote: Hi Cuong If you pass the

Re: [Moses-support] Phrase Extraction Problem

2013-01-30 Thread Marcello Federico
Hi, the total number of extracted phrases in a sentence pair depends on: - the particular word alignment you are considering - the heuristic you adopt for the words left unaligned or aligned with the null word Greetings, Marcello --- Short from my mobile phone On 30/gen/2013, at 05:46 PM,

Re: [Moses-support] Phrase Extraction Problem

2013-01-30 Thread Barry Haddow
Hi Cuong If you pass the aligned sentences through your phrase extraction, and through Moses phrase extraction, one at a time then you should be able to see where the difference is. As Marcello said, it could be in the handling of unaligned words, cheers - Barry On 30/01/13 16:39, Cuong