Re: [Moses-support] Proposal to replace vertical bar as factor delimeter

2010-11-15 Thread Christof Pintaske

Hello Lane,

frankly I don't see this as sooo desireable. You just exchange a magic 
character with an even more magic one. Since the proposed character is 
not an ASCII character you'll eventually run into encoding problems. And 
for most people it'd be very difficult to type this character on the 
keyboard and to distinguish it from the regular | symbol. It just gets 
more and more obscure.


To really improve on the ugly magic file format issue I'd love to see 
support for XML-based input and configuration files. There is tons of 
tooling out there to handle XML files, there are no limitation in 
respect to the content (even multi-line input would be possible). You 
can easily check conformance (using a DTD) and you can keep them 
backwards compatible if you desire so. Of course it's very well 
understood that this is a major effort that's not easy to address.


just my two cents
Christof

PS: and yes, I spent substantial effort in making my tool chain pipe 
proof. I'd hate to sift through all that again for no practical gain.





On 11/15/10 12:55 PM, Lane Schwartz wrote:
I'd like to propose changing the current factor delimiter to something 
other than the single vertical bar |
Looking through the mailing archives, it seems that the failure to 
properly purge your corpus of vertical bars is a frequent source of 
headaches for users. I know I've encountered this problem before, but 
even knowing that I should do this, just today I had to track down 
another vertical bar-related problem.
I don't really care what the replacement character(s) ends up being, 
just so that any corpus munging related to this delimiter gets handled 
internally by moses rather than being the user's responsibility.
If moses could easily be modified to take a multi-character delimeter, 
that would probably be best. My suggestion for a single-character 
delimiter would be something with the following characteristics:

* Character should be printable (ie not a control character)
* Character should be one that's implemented in most commonly used fonts
* Character should be highly obscure, and extremely unlikely to appear 
in a corpus

* Character should not be confusable with any commonly used character.
Many characters in the Dingbats section of Unicode (block 2700) would 
fit these desiderata.
I suggest Unicode character 2759, MEDIUM VERTICAL BAR. This is a 
highly obscure printable character that looks like a thick vertical 
bar. It's obviously a vertical bar, but just as obviously not the same 
thing as the regular vertical bar |.

Cheers,
Lane


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Word alignment information in binary phrase table

2010-10-21 Thread Christof Pintaske

 Hi,

train-model.perl with the parameter -phrase-word-alignment adds 
word-for-word alignment information to the phrase table. Unfortunately 
this information get's lost when converting the textual phrase-table 
into a binary format with processPhraseTable. Using 
processPhraseTable -alignment-info was meant to store the alignment 
information in the binary table as well. This functionality is broken 
since the format for the word alignment information changed and 
currently no word alignment information is stored in the binary phrase 
tables. Being required to use the textual file limits the size of the 
phrase-table in respect to the memory on the server.


The attached patch provides the missing changes. It stores new-style 
alignment information with the target candidates in the 
phrase-table.binphr.tgtdata.wa file and reads them out correspondingly 
(It doesn't split the alignment information into source and target 
alignment as in the old implementation/format. It keeps it in a format 
supported by TargetPhrase::SetAlignmentInfo(std::string)).


I tested the change with valgrind for both moses and 
processPhraseTable in a smaller moses translation system without any 
complaints. And both the translation and the alignment file that gets 
produced with moses -use-alignment-info -print-alignment-info -T 
File are identical, regardless of text or binary phrase-table. The 
patch should not change the behavior for phrase-tables without 
word-alignment.


I hope you find the patch useful and hopefully it can be committed to 
repo. Of course, please let me know if any modifications are necessary 
or desirable.


best regards
Christof
diff -wcr moses-2010-09-24/misc/queryPhraseTable.cpp 
moses-2010-09-24.svn/misc/queryPhraseTable.cpp
*** moses-2010-09-24/misc/queryPhraseTable.cpp  2010-10-20 18:04:04.0 
-0700
--- moses-2010-09-24.svn/misc/queryPhraseTable.cpp  2010-09-24 
12:57:04.0 -0700
***
*** 46,55 
srcphrase = Moses::Tokenizestd::string(line);
  
std::vectorMoses::StringTgtCand tgtcands;
!   std::vectorstd::string wordAlignment;
  
if(useAlignments)
!   ptree.GetTargetCandidates(srcphrase, tgtcands, 
wordAlignment);
else
ptree.GetTargetCandidates(srcphrase, tgtcands);
  
--- 46,55 
srcphrase = Moses::Tokenizestd::string(line);
  
std::vectorMoses::StringTgtCand tgtcands;
!   std::vectorMoses::StringWordAlignmentCand src_wa, tgt_wa;
  
if(useAlignments)
!   ptree.GetTargetCandidates(srcphrase, tgtcands, src_wa, 
tgt_wa);
else
ptree.GetTargetCandidates(srcphrase, tgtcands);
  
***
*** 60,66 
std::cout   |||;
  
if(useAlignments) {
!   std::cout wordAlignment[i]   |||;
}
  
for(uint j = 0; j  tgtcands[i].second.size(); j++)
--- 60,78 
std::cout   |||;
  
if(useAlignments) {
!   for(uint j = 0; j  src_wa[i].second.size(); 
j++)
!   if(src_wa[i].second[j] == -1)
!   std::cout   ();
!   else
!   std::cout   (  
src_wa[i].second[j]  );
!   std::cout   |||;
! 
!   for(uint j = 0; j  tgt_wa[i].second.size(); 
j++)
!   if(tgt_wa[i].second[j] == -1)
!   std::cout   ();
!   else
!   std::cout   (  
tgt_wa[i].second[j]  );
!   std::cout   |||;
}
  
for(uint j = 0; j  tgtcands[i].second.size(); j++)
diff -wcr moses-2010-09-24/moses/src/PDTAimp.h 
moses-2010-09-24.svn/moses/src/PDTAimp.h
*** moses-2010-09-24/moses/src/PDTAimp.h2010-10-20 17:58:53.0 
-0700
--- moses-2010-09-24.svn/moses/src/PDTAimp.h2010-09-24 12:57:04.0 
-0700
***
*** 160,167 
  
// get target phrases in string representation
std::vectorStringTgtCand cands;
!   std::vectorstd::string wacands;
!   m_dict-GetTargetCandidates(srcString,cands,wacands);
if(cands.empty()) 
{
return 0;
--- 160,169 
  
// get target phrases in string representation
std::vectorStringTgtCand cands;
!   std::vectorStringWordAlignmentCand swacands;
!   std::vectorStringWordAlignmentCand twacands;
! //

[Moses-support] KenLM distributed with Moses

2010-10-18 Thread Christof Pintaske
  Hi,

I saw that KenLM source code is distributed from the Moses svn and can 
set in configure. Is anybody here using it and willing to share some 
experiences? Is it thread-safe and can used in Moses together with SRI 
and IRST ? Any particular advantages? Is there any more information than 
just the README?

any hints are very welcome
Christof


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Printing alignment information

2010-10-06 Thread Christof Pintaske

 Hello Souhir,

are you using a recent revision of moses? My phrase tables look a bit 
different from yours.

To see the alignment information I use the switches:

-use-alignment-info -print-alignment-info -T file

and the alignment information is written to the file.

best regards
Christof


On 10/6/10 7:23 AM, Souhir Gahbiche wrote:

Hi all,

I'd like to save alignment information when decoding with moses in the 
log file.
I called moses with the -use-alignment-info and the 
-print-alignment-info options but still don't have any alignments 
information.

My phrase table looks like :
! hAyty  . ||| ! Haïti » . ||| (0) (1) (2) (3) ||| (0) (1) (2) 
(3) ||| 1 0.187346 1 0.0661179 2.718


Is it the wrong parameters to get the alignments information?
Regards


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] filter-pt doesn't work?

2010-09-24 Thread Christof Pintaske

 Hi,

it seems scores and extra have changed the location in the phrase table. 
The attached patch got me a lot further along, I changed the order in 
the output as well.


Not sure if

if (print_cooc_counts) os   |||   pp.cfe pp.cf   
  pp.ce;

if (print_neglog_significance) os   |||   pp.nlog_pte;

still prints things in the correct order (pt-filter.cpp lines 144 - 145.

best regards
Christof



On 9/24/10 1:37 PM, Christof Pintaske wrote:

   Hi,

I just updated my moses installation to trunk. Unfortunately I found
that filter-pt is now crashing instead of pruning. The patch below fixed
the bleeding for me. However even with that patch I receive plenty of
error messages:

  No occurrences found

and the pruned table is just too small to be true. Does filter-pt get
out of step because the phrase table has now 5 records instead of 3 (my
old installation is from June). filter-pt.cpp seems to be completely
unchanged compared to the June installation.

Any hints or fixes are welcome.

best regards
Christof



diff -wc sigtest-filter/filter-pt.cpp
../moses-2010-06-04/sigtest-filter/filter-pt.cpp
*** sigtest-filter/filter-pt.cpp2010-09-24 13:19:34.0 -0700
--- ../moses-2010-06-04/sigtest-filter/filter-pt.cpp2010-06-04
15:33:39.0 -0700
***
*** 103,111 
}
}
}
- if (i != scores.end()) {
++i;
- }
char f[24];
char *fp=f;
while (i != scores.end()  *i != ' ') {

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


*** filter-pt.cpp   2010-09-24 14:45:19.0 -0700
--- /export/home/moses/src/moses-2010-06-04/sigtest-filter/filter-pt.cpp
2010-06-04 15:33:39.0 -0700
***
*** 87,105 
  {
  size_t pos = 0;
  std::string::size_type nextPos = str.find(SEPARATOR, pos);
! this-f_phrase = str.substr(pos,nextPos); 
! 
! pos = nextPos + SEPARATOR.size();
! nextPos = str.find(SEPARATOR, pos);
! this-e_phrase = str.substr(pos,nextPos-pos); 
! 
! pos = nextPos + SEPARATOR.size();
  nextPos = str.find(SEPARATOR, pos);
! this-scores = str.substr(pos,nextPos-pos); 
! 
! pos = nextPos + SEPARATOR.size();
! this-extra = str.substr(pos);
! 
  int c = 0;
  std::string::iterator i=scores.begin();
  if (index  0) {
--- 87,98 
  {
  size_t pos = 0;
  std::string::size_type nextPos = str.find(SEPARATOR, pos);
! this-f_phrase = str.substr(pos,nextPos); pos = nextPos + 
SEPARATOR.size();
  nextPos = str.find(SEPARATOR, pos);
! this-e_phrase = str.substr(pos,nextPos-pos); pos = nextPos + 
SEPARATOR.size();
! nextPos = str.rfind(SEPARATOR);
! this-extra = str.substr(pos, ((nextPos  pos)?(nextPos-pos):0));
! this-scores = str.substr(nextPos + SEPARATOR.size(),std::string::npos);
  int c = 0;
  std::string::iterator i=scores.begin();
  if (index  0) {
***
*** 110,118 
  }
  }
  }
- if (i != scores.end()) {
  ++i;
- }
  char f[24];
  char *fp=f;
  while (i != scores.end()  *i != ' ') {
--- 103,109 
***
*** 139,146 
  std::ostream operator  (std::ostream os, const PTEntry pp)
  {
os  pp.f_phrase   |||   pp.e_phrase;
-   os   |||   pp.scores;
if (pp.extra.size()0) os   |||   pp.extra;
if (print_cooc_counts) os   |||   pp.cfe pp.cf 
pp.ce;
if (print_neglog_significance) os   |||   pp.nlog_pte;
return os;
--- 130,137 
  std::ostream operator  (std::ostream os, const PTEntry pp)
  {
os  pp.f_phrase   |||   pp.e_phrase;
if (pp.extra.size()0) os   |||   pp.extra;
+   os   |||   pp.scores;
if (print_cooc_counts) os   |||   pp.cfe pp.cf 
pp.ce;
if (print_neglog_significance) os   |||   pp.nlog_pte;
return os;
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Use of unfactored training data set in moses???

2010-05-22 Thread Christof Pintaske
where did you get your Moses installation from? train-model.perl 
should be in moses/bin/scripts-*/training . At least it has always been 
there for me. If that's not the case then you may want to check the 
source or checkout a new version from the repository.


hope that helps
Christof


Dear All,

I am doing a research on the development of statistical translation 
system for Sri Lankan local languages (Sinhala and Tamil). The 
available corpus is unfactored and it was created by me.. So, I would 
like to know whether the script found in 
moses-scripts/scripts-timestamp/training/train-factored-phrase-model.perl 
is suitable for the training. The manual on Moses use specifies a 
script called train-model.perl for unfactored model training, which 
I was unable to locate. Expecting your help as soon as possible. I am 
really thanking your help to solve the above issue.


Thank you.



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
   


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] What is the use of the lm parameter in the model training stage?

2010-05-20 Thread Christof Pintaske

On 5/20/10 8:12 PM, yifeng...@sina.com wrote:


In Factored Tutorial, the first example is:

% train-model.perl \
--corpus factored-corpus/proj-syndicate \
--root-dir unfactored \
--f de --e en \
--lm 0:3:factored-corpus/surface.lm:0

I think the language model is usually used in the decoding stage in 
SMT. What is the use of the lm parameter which lists a language model 
in the model training stage?


I'm not sure if it's really required, but it's written to the moses.ini, 
which you later need in decoding. Otherwise you'd have to patch the 
moses.ini manually.


just my 2 cents of wisdom
Christof

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Build Moses for translating English to Chinese.

2010-02-11 Thread Christof Pintaske
Hi,

you may want to have a closer look at tokenizer.perl which is used for 
word-breaking. It seems there is some special logic to handle English, 
French, and Italian but nothing much else.

I'm not sure if you can or plan to reveal your findings here on the list 
but at any rate I'd be very interested to learn how Chinese worked for you.

best regards
Christof

nati g wrote:
 Hello,
  Do we need any special scripts to build moses for translating english 
 to chinese.
  
 thanks in advance.
 
 
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] parse-de-bitpar.perl peculiarities

2010-02-09 Thread Christof Pintaske
Hi,

in parse-de-bitpar.perl the code sequence

while(STDIN)
{
 foreach (split)
 {
 s/\(/\*LRB\*/g;
 s/\)/\*RRB\*/g;
 print TMP $_.\n;
 }
 print TMP \n;
}

adds a newline after each single word. Is this required? To me it looks 
like bitpar parses sentences on a single line just fine. I'm asking 
because this behavior causes trouble with my data down the line:




Annotating my English (source language) corpus with bitpar (while 
keeping the French target corpus plain) adds empty lines to the 
annotated English source. This brings source and target file out of sync.

The root cause seems to be that internally parse-de-bitpar.perl adds a 
newline after each word before feeding it to bitpar. In addition iconv 
may eliminate certain characters which lead to empty lines that are 
eventually interpreted as a sentence break.

An (admittedly very ugly) segment like:

 you have been invited to community , collection1 by user1 ” , “ 
message from ” , and “ please use the following url to access the 
community .

gets parsed without any obvious error by bitpar when I feed it directly, 
or even after being filtered initially through iconv. However within 
parse-de-bitpar.perl it gets first converted into:


you
have
been
invited
to
community
,
collection1
by
user1

,

message
[...]


Which bitpar parses into 5 sentences

(TOP (X/domV (NP/base (CD \))(SBAR/0  (-NONE-(0))(S/fin 
(NP-SBJ/n3s/base+\#?NPSBJ? (PRP/n3s you) [...]
No parse for: ,
No parse for: message from
No parse for: , and
(TOP (S/fin/. (NP-SBJ/n3s/base+\#?NPSBJ? (NN please))(VP/n3s_?NPSBJ? 
(VVP/nst use)(NP/base (DT/the the)(JJ following)[...]


parse-de-bitpar.perl changes the No parse for into empty lines. Since 
1 sentence gets unfolded into 5 lines, English source and the 
unannotated target get out of sync.

any comments are welcome

best regards
Christof



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] moses_chart: bug in parse-de-bitpar.perl ?

2010-02-04 Thread Christof Pintaske
Hi,

I'm playing with bitpar for parsing and annotating English content. I 
modified parse-de-bitpar.perl to use the TraceParser grammar files 
instead of the German Tiger files. When I tried to annotate my corpus 
parse-de-bitpar.perl died on me on two occasions:

1. a grammar like (a (b (c))(d))  does not get parsed correctly. 
parse-de-bitpar.perl chokes on the double (or multiple) closing brackets 
c))

2. quoted brackets are not parsed correctly. bitpar threw something like 
\\(xyz\)\ at parse-de-bitpar.perl which rang it down. I can provide 
exact examples if anybody is interested.

The patch below did it for me.

Does anybody have experiences to share regarding syntax annotation? Is 
collins the way to go for English?

best regards
Christof




diff -w  local/bin/parse-bitpar.perl 
~/libexec/moses-chart/bin/scripts/training/wrappers/parse-de-bitpar.perl
61c55
  my ($label,$rest) = split(/(?!\\)[\)\( ]/,substr($line,$i+1));
---
   my ($label,$rest) = split(/[\( ]/,substr($line,$i+1));
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] moses_chart: usage information for phrase_extract

2010-02-04 Thread Christof Pintaske
Hi,

here's a minor discovery in phrase_extract

phrase_extract does not give any usage information, even though it seems 
somebody had the intention to do so:

if (argc  1)
{
  cerr  syntax: relax-parse  in-parse  out-parse [
 --LeftBinarize | ---RightBinarize |
 --SAMT 1-4 ]  endl;
  exit(1);
}

argc is of course always 1 or greater.

It would be great if phrase_extract would support something like 
--help or -h. Or maybe require at least one argument and then 
provides usage on if (argc  2)


of course this is just minor stuff.

best regards
Christof


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] moses_chart: tuning with mert-moses-new.pl doesn't change the moses.ini

2010-02-02 Thread Christof Pintaske



Hieu Hoang wrote:
 hi christof
 
 the mert-moses-new hasn't been modified to do chart decoding yet, only 
 the original mert-moses.pl. There's just little changes that needs to be 
 done but noone's gotten round to doing it yet

many, many, thanks, that did it! I used mert-moses.pl and it completed 
tuning overnight!

Results (bleu scores) for my hierarchical phrase model are slightly 
worse than using the good old phrase based decoder. Does anybody have 
experiences if tagging the (English) source with an annotating parser 
(bitpar and collins are mentioned in the documentation) improves things?

best regards
Christof





 
 On 01/02/2010 19:13, Christof Pintaske wrote:
 Hi,

 while running mert-moses-new.pl to tune the chart-decoder I get this output:

 [...]
 The decoder returns the scores in this order: lm tm tm tm tm tm tm w
 [...]
 Executing: /export/home/moses/libexec/moses-chart/bin/extractor
 --scfile run1.scores.dat [...]
 The decoder produced also some 'tm' scores, but we do not know the
 ranges for them, no way to optimize them

 it seems it dies on these 'tm' scores. Is there a way to prevent the
 decoder from producing these scores?


 regards
 Christof



 Christof Pintaske wrote:

 Hi,

 I'm running

 mert-moses-new.pl
~/en-fr_chart/tuning/token_lowercase.en
~/en-fr_chart/tuning/token_lowercase.fr
/export/home/moses/libexec/moses-chart/bin/moses_chart
~/en-fr_chart/training/pm/model/moses.ini
--mertdir=/export/home/moses/libexec/moses-chart/bin
--working-dir=~/en-fr_chart/tuning
--decoder-flags=-v 0 --no-filter-phrase-table

 for tuning. It performs exactly one iteration and writes a
 run1.moses.ini in the tuning directory. That moses.ini is almost
 identical to the one that I got after training (see below). Am I missing
 the obvious?

 any hints are welcome

 Christof
  
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] moses for haitian relief

2010-01-27 Thread Christof Pintaske
Hi Christopher,

when I tried to build srilm on a amd64 machine running Linux, I found 
that srilm used the common/Makefile.machine.i686 Makefile. This, in 
turn, defines compiler flags GCC_FLAGS = -m32 -mtune=pentium3 -W
I removed the -m32 -mtune=pentium from the flags to build regular 64 
bit code. After that I could link the result to moses quite fine.

hope that helps
Christof


christopher taylor wrote:
 hello everyone!
 
 i'm currently trying to build an instance of moses to support
 crisiscommons.org's machine translation project (i'm currently the
 PM).
 
 i really want to give moses a spin *but* i'm having issues building it.
 
 my build trouble is related to liboolm.a - here's out put from my compilation:
 
 Making all in moses-cmd/src
 make[2]: Entering directory `../mt/moses/moses-cmd/src'
 g++  -g -O2  -L..//mt/srilm/lib/i686 -L..//mt/irstlm//lib/x86_64 -o
 moses  Main.o mbr.o IOWrapper.o TranslationAnalysis.o
 -L../../moses/src -lmoses   -loolm -ldstruct -lmisc -lirstlm -lz
 /usr/bin/ld: skipping incompatible ../mt/srilm/lib/i686/liboolm.a when
 searching for -loolm
 /usr/bin/ld: cannot find -loolm
 collect2: ld returned 1 exit status
 make[2]: *** [moses] Error 1
 make[2]: Leaving directory `..//mt/moses/moses-cmd/src'
 make[1]: *** [all-recursive] Error 1
 make[1]: Leaving directory `..//mt/moses'
 make: *** [all] Error 2
 
 thanks so much for your help!
 
 chris taylor
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses server dies when loading language model

2010-01-27 Thread Christof Pintaske
I'd think that you did not compile or link successfully against SRILM. 
As a consequence LanguageModelInternal tries to load the language model 
as opposed to being overloaded by the correct SRILM code.

just my 2 cents
Christof

Panagiotis Kanavos wrote:
 Hi,
 
 
 I downloaded moses from svn and followed the steps to build the moses 
 server with multithreading. I think I built it successfully, but when I 
 run it I get this error when it starts loading the language model:
 
 
 mosesserver: LanguageModelInternal.cpp:22: virtual bool 
 Moses::LanguageModelInternal::Load(const std::string, 
 Moses::FactorType, float, size_t): Assertion `nGramOrder = 3' failed.
 
 
 I had already installed moses on my Ubuntu 64bit Server using Eric 
 Nichols packages, which still runs fine.
 
 
 TIA
 
 
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] moses_chart and recaser/phrase-table ?

2010-01-22 Thread Christof Pintaske
Hello Hieu,

I actually used the chart decoder and scripts to train the recaser and 
consequently I used the chart decoder for the recaser decoding as well.

It seems that the chart decoder still writes an old-style moses.ini when 
used with the recaser training scripts, however it's not able to read them.

In Moses::StaticData::LoadPhraseTables in StaticData.cpp:832

 string filePath= token[4];

it expects [ttable-file] to have at least 5 entries, hardcoded. So the 
recaser scripts are actually not usable with the chart decoder. You may 
want to add an assertion for that in the code.



As a side note, I saw that the chart decoder always links to libpthread, 
regardless of --enable-threads. Is the chart decoder multithreaded? and 
can it still be used with the irstlm language model?

best regards
Christof



Hieu Hoang wrote:
 hi christof
 
 there's small fiddly changes to the ini file format so the chart decoder 
 isn't backwardly compatible. You should use the trunk decoder if the 
 data was trained using the trunk scripts
 
 On 22/01/2010 00:20, Christof Pintaske wrote:
 Hi,

 is the moses/moses_chart executable from the mt3_chart branch supposed
 to be usable as a recaser (that is using a phrase-table) ? When I do so
 then I get a SEGV (see stacktrace at the very end).

 Working on the same files with moses from svn/trunk works fine.

 I read somewhere that irstlm is not thread-safe and in the debug output
 I see that a new thread has been started. Is that the problem? I thought
 I'd be safe because I did *not* configure moses with --enable-threads
 nor with boost.

 many thanks
 Christof




 (gdb) run -f ~/data/engine/en-fr_chart/recaser/moses.ini  x
 Starting program: /export/home/moses/libexec/moses-chart/bin/moses -f
 ~/data/engine/en-fr_chart/recaser/moses.ini  x
 [Thread debugging using libthread_db enabled]
 Defined parameters (per moses.ini or switch):
  config: /export/home/moses/data/engine/en-fr_chart/recaser/moses.ini
  distortion-limit: 6
  input-factors: 0
  lmodel-file: 1 0 3
 /export/home/moses/data/engine/en-fr_chart/recaser/cased.irstlm.gz
  mapping: 0 T 0
  ttable-file: 0 0 5
 /export/home/moses/data/engine/en-fr_chart/recaser/phrase-table.gz
  ttable-limit: 20
  weight-d: 0.6
  weight-l: 0.5000
  weight-t: 0.2 0.2 0.2 0.2 0.2
  weight-w: -1
 Added 0 Distortion 0-0
 Added 1 !UnknownWordPenalty 1-1
 Added 2 WordPenalty 2-2
 Loading lexical distortion models...
 have 0 models
 Start loading LanguageModel
 /export/home/moses/data/engine/en-fr_chart/recaser/cased.irstlm.gz :
 [0.000] seconds
 Added 3 LanguageModel 3-3
 In LanguageModelIRST::Load: nGramOrder = 3
 Loading LM file (no MAP)
 iARPA
 loadtxt()
 1-grams: reading 8178 entries
 2-grams: reading 37042 entries
 Detaching after fork from child process 4881.
 3-grams: reading 61742 entries
 Detaching after fork from child process 4882.
 done
 OOV code is 8177
 OOV code is 8177
 IRST: m_unknownId=8177
 Finished loading LanguageModels : [1.000] seconds
 [New Thread 0x2b0788eaa940 (LWP 4878)]

 Program received signal SIGSEGV, Segmentation fault.
 0x00398a29c8c8 in std::basic_stringchar, std::char_traitschar,
 std::allocatorchar  ::basic_string () from /usr/lib64/libstdc++.so.6
 (gdb) where
 #0  0x00398a29c8c8 in std::basic_stringchar,
 std::char_traitschar, std::allocatorchar  ::basic_string () from
 /usr/lib64/libstdc++.so.6
 #1  0x0045ca7a in Moses::StaticData::LoadPhraseTables
 (this=0x74c8e0) at StaticData.cpp:832
 #2  0x004646ea in Moses::StaticData::LoadData (this=0x74c8e0,
 parameter=value optimized out) at StaticData.cpp:406
 #3  0x00406fd1 in main (argc=3, argv=0x7fffeccfc5d8) at
 ../../moses/src/StaticData.h:217
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support



 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] moses-chart: buggy line in extract.o.sorted

2010-01-20 Thread Christof Pintaske
Hi,

my extract.o.gz respectively extract.o.sorted produce a large number 
of error messages: buggy line. For example:

buggy line (o_previous:): [X] , [X] and ||| [X] , [X] et |||
buggy line (o_following:): [X] , [X] and ||| [X] , [X] et |||

in fact extract has generated the respective line(s) in extract.o 
without mono, other or swap attribute which seem to trigger these 
complaints. For example:

[X] interface ( cli ||| [X] de l' interface ||| other swap
[X] interface [X] ||| [X] de [X] interface |||
[X] [X] cli ||| [X] de l' [X] |||
[X] interface ( cli ) ||| [X] de l' interface ||| other swap
[X] interface ( [X] ||| [X] de [X] interface |||
[X] interface [X] ) ||| [X] de [X] interface |||
[X] [X] cli ) ||| [X] de l' [X] |||

is this something I can ignore? I'd really like to understand what's 
going on here :-)

many thanks and best regards
Christof
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] moses-chart: buggy line in extract.o.sorted

2010-01-20 Thread Christof Pintaske
Philipp Koehn wrote:
 Hi,
 
 it seems you are using hierarchical rules and lexicalized
 reordering at the same time. This is asking for trouble...

oops, I had blindly carried over all the commandline arguments from my 
phrase based training. Now it works

thanks a lot
Christof


 
 -phi
 
 On Wed, Jan 20, 2010 at 11:05 PM, Christof Pintaske
 christof.pinta...@sun.com wrote:
 Hi,

 my extract.o.gz respectively extract.o.sorted produce a large number
 of error messages: buggy line. For example:

 buggy line (o_previous:): [X] , [X] and ||| [X] , [X] et |||
 buggy line (o_following:): [X] , [X] and ||| [X] , [X] et |||

 in fact extract has generated the respective line(s) in extract.o
 without mono, other or swap attribute which seem to trigger these
 complaints. For example:

 [X] interface ( cli ||| [X] de l' interface ||| other swap
 [X] interface [X] ||| [X] de [X] interface |||
 [X] [X] cli ||| [X] de l' [X] |||
 [X] interface ( cli ) ||| [X] de l' interface ||| other swap
 [X] interface ( [X] ||| [X] de [X] interface |||
 [X] interface [X] ) ||| [X] de [X] interface |||
 [X] [X] cli ) ||| [X] de l' [X] |||

 is this something I can ignore? I'd really like to understand what's
 going on here :-)

 many thanks and best regards
 Christof
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] prune-lm in endless loop

2009-12-02 Thread Christof Pintaske
Hi,

just for the records, recompiling prune-lm with additionally setting the 
compiler option

-fno-strict-aliasing

solved the problem. It seems gcc 4.1.2 didn't like the magic casting 
that's used in some of the source files.

Is there a place where these kind of issues are documented?

best regards
Christof




Christof Pintaske wrote:
 Hi,
 
 I created a 3-gram LM with the irstlm toolkit (5.0.22). The LM has about 
 25M entries:
 
 ngram 1= 300209
 ngram 2= 4864097
 ngram 3= 20336549
 
 
 I tried to prune it with prune-lm on a Linux machine.
 
 prune-lm --threshold=1e-6,1e-6 sun.irstlm.gz sun.pruned.irlstlm  x.out
 
 In the out x.out I get repeated error messages
 
 ng: qu0 ts=1.00059 tbs=0.0196106 k=0 ns=20
 
 probably more than 100M identical ones. After running the pruning over 
 night the stderr output reached 100GB size and I stopped the process.
 
 Just looking at the source code I assume that lmtable::wdprune() loops 
 endless over the prune: goto statement. Are there any problems with 
 the pscale() routine?
 
 Any hints where to look at are highly appreciated.
 
 best regards
 Christof
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support