they are in the file named
extract*.gz
On 17/01/15 16:26, Cyrine NASRI wrote:
Hello,
I'am looking if is there any possiblity to access to the file which
contains the set of phrases extracted, because I need to do some
modifications to these phrase before build translation table
Thank you
Quoting Philipp Koehn (2014-10-07 04:57:08)
Hi,
which version of symal are you using?
The one distributed with Moses has not changed, but I am aware that
Nicola Bertoldi's online mgiza includes a version symal with reported
behaviour. You should use the Moses one (in the Moses bin
Hi,
I'm using the latest git version of moses, and it seems as if the training
pipeline got broken somehow as the format of aligned.grow-diag.final changed.
I'm invoking model-train.perl as follows:
/vol/customopt/machine-translation/src/mosesdecoder/scripts/training/train-model.perl
Hi,
which version of symal are you using?
The one distributed with Moses has not changed, but I am aware that
Nicola Bertoldi's online mgiza includes a version symal with reported
behaviour. You should use the Moses one (in the Moses bin directory).
-phi
On Mon, Oct 6, 2014 at 4:00 AM,
Hi,
I am using train-model.perl with
--extract-options=--IncludeSentenceId
and it seems that the sentence id is somehow getting into the phrase
table as a count and later used for phrase translation weight
calculation, for instance the extract (last column is the Id):
#c the compound or
I was planning to use it for a custom feature function later.
W dniu 23.07.2014 13:11, Hieu Hoang pisze:
i can change it so that the sentence id is put into a key-value field
in the last column.
what is the sentence id used for? is it just for debugging purposes?
On 23 July 2014 11:36,
Key-value format would actually be fine.
W dniu 23.07.2014 13:12, Marcin Junczys-Dowmunt pisze:
I was planning to use it for a custom feature function later.
W dniu 23.07.2014 13:11, Hieu Hoang pisze:
i can change it so that the sentence id is put into a key-value field
in the last column.
Hi,
the sentence ID is being used for the domain indicator features.
If you run phrase-extract's score with specifying a domain file,
it then it uses the sentence IDs to find out which domain the
phrase pair was found in.
This is a standard features in Edinburgh's phrase-based system
for the
ah ok.
I thought it was just for debugging. I'm not gonna change it since it's
gonna involve months of debugging.
Ideally, the extract format should be fixed like the phrase-table, with
the last column being key-value pairs. Also, way the key-value pairs are
processed should be automatic
So, how come this is not damaging the Edinburgh system?
W dniu 23.07.2014 17:32, Hieu Hoang pisze:
ah ok.
I thought it was just for debugging. I'm not gonna change it since
it's gonna involve months of debugging.
Ideally, the extract format should be fixed like the phrase-table,
with the
Because calculating translation probabilities from sentence ids is
unexpectedly beneficial?
On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote:
So, how come this is not damaging the Edinburgh system?
W dniu 23.07.2014 17:32, Hieu Hoang pisze:
ah ok.
I thought it was just for debugging. I'm
it's likely we're using fractional count so there's a extra column
On 23 July 2014 16:34, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote:
So, how come this is not damaging the Edinburgh system?
W dniu 23.07.2014 17:32, Hieu Hoang pisze:
ah ok.
I thought it was just for debugging. I'm
In a corpus sorted with sentences sorted by release date this could
actually make sense :)
W dniu 23.07.2014 17:40, Barry Haddow pisze:
Because calculating translation probabilities from sentence ids is
unexpectedly beneficial?
On 23/07/14 16:34, Marcin Junczys-Dowmunt wrote:
So, how come
Hi,
this is how extract is called:
extract corpus.en corpus.fr align extract 5 --IncludeSentenceId
this is how score is called:
score extract lex.f2e phrase-table.half --GoodTuring --DomainIndicator
domains.5
phrase table looks fine to me
-phi
On Wed, Jul 23, 2014 at 11:42 AM, Marcin
So, adding --IgnoreSentenceId to score might fix that without
messing up your stuff? I guess I can do that if you can't be bothered,
Hieu.
W dniu 23.07.2014 17:53, Philipp Koehn pisze:
Hi,
this is how extract is called:
extract corpus.en corpus.fr http://corpus.fr align extract 5
Hi Marcin
It appears that there is an --IgnoreSentenceId argument already, added
by Maria during last year's MTM
[gna]bhaddow: git blame ScoreFeature.cpp | grep Ignore
bff12363 (maria nadejde 2013-09-13 12:45:46 +0200 42) if (args[i] ==
--IgnoreSentenceId) {
cheers - Barry
On 23/07/14
i was doing it it, but mine was a more holistic approach but it would have
broken compability.
so i can't be bothered
On 23 July 2014 16:56, Marcin Junczys-Dowmunt junc...@amu.edu.pl wrote:
So, adding --IgnoreSentenceId to score might fix that without
messing up your stuff? I guess I can
Oh. Good! I guess there is a lesson to be learned somewhere.
Thanks.
W dniu 23.07.2014 18:06, Barry Haddow pisze:
Hi Marcin
It appears that there is an --IgnoreSentenceId argument already, added
by Maria during last year's MTM
[gna]bhaddow: git blame ScoreFeature.cpp | grep Ignore
If you mean a tool to create a corpus of aligned sentences, hunalign is
suppose to be ok
http://mokk.bme.hu/en/resources/hunalign/
If you mean a tool to create translation rules from source and target
sentence with word alignments, then the extract and extract-rules programs
in moses does
Hi,
Thanks for all reading suggestions. Please, see below.
On Thu, Apr 18, 2013, at 0:13, Philipp Koehn wrote:
Hi,
On Thu, Apr 4, 2013 at 9:04 AM, Per Tunedal per.tune...@operamail.com
wrote:
Hi,
obviously word alignment is very important for extraction of the most
useful phrases. What
Hi,
obviously word alignment is very important for extraction of the most
useful phrases. What about other ways for phrase extraction?
In your text book 'Statistical Machine Translation' you do mention an
other approach:
we may also use the expectation maximization algorithm to directly find
Hi,
I'm hereby attaching a file. I got it when executed 5th step.
I don't why phrase table,extract.sorted.gz etc. files are not extracted.
please help me.
And also I want to know about tokenization step.
In tokenization step, rather than dividing a sentence into tokens, will any
extra
processing
I did it, follow to Barry's suggestion.
I test on a super small corpus with 2 pairs of sentences and generate 800
bilingual phrases :-D
Thanks to you, Barry and Prof. Marcello Federico.
On Thu, Jan 31, 2013 at 4:08 AM, Barry Haddow bhad...@staffmail.ed.ac.ukwrote:
Hi Cuong
If you pass the
Hi, the total number of extracted phrases in a sentence pair depends on:
- the particular word alignment you are considering
- the heuristic you adopt for the words left unaligned or aligned with the null
word
Greetings,
Marcello
---
Short from my mobile phone
On 30/gen/2013, at 05:46 PM,
Hi Cuong
If you pass the aligned sentences through your phrase extraction, and
through Moses phrase extraction, one at a time then you should be able
to see where the difference is. As Marcello said, it could be in the
handling of unaligned words,
cheers - Barry
On 30/01/13 16:39, Cuong
25 matches
Mail list logo