Re: [Moses-support] Proposal to replace vertical bar as factor delimeter
Hello Lane, frankly I don't see this as sooo desireable. You just exchange a magic character with an even more magic one. Since the proposed character is not an ASCII character you'll eventually run into encoding problems. And for most people it'd be very difficult to type this character on the keyboard and to distinguish it from the regular | symbol. It just gets more and more obscure. To really improve on the ugly magic file format issue I'd love to see support for XML-based input and configuration files. There is tons of tooling out there to handle XML files, there are no limitation in respect to the content (even multi-line input would be possible). You can easily check conformance (using a DTD) and you can keep them backwards compatible if you desire so. Of course it's very well understood that this is a major effort that's not easy to address. just my two cents Christof PS: and yes, I spent substantial effort in making my tool chain pipe proof. I'd hate to sift through all that again for no practical gain. On 11/15/10 12:55 PM, Lane Schwartz wrote: I'd like to propose changing the current factor delimiter to something other than the single vertical bar | Looking through the mailing archives, it seems that the failure to properly purge your corpus of vertical bars is a frequent source of headaches for users. I know I've encountered this problem before, but even knowing that I should do this, just today I had to track down another vertical bar-related problem. I don't really care what the replacement character(s) ends up being, just so that any corpus munging related to this delimiter gets handled internally by moses rather than being the user's responsibility. If moses could easily be modified to take a multi-character delimeter, that would probably be best. My suggestion for a single-character delimiter would be something with the following characteristics: * Character should be printable (ie not a control character) * Character should be one that's implemented in most commonly used fonts * Character should be highly obscure, and extremely unlikely to appear in a corpus * Character should not be confusable with any commonly used character. Many characters in the Dingbats section of Unicode (block 2700) would fit these desiderata. I suggest Unicode character 2759, MEDIUM VERTICAL BAR. This is a highly obscure printable character that looks like a thick vertical bar. It's obviously a vertical bar, but just as obviously not the same thing as the regular vertical bar |. Cheers, Lane ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Word alignment information in binary phrase table
Hi, train-model.perl with the parameter -phrase-word-alignment adds word-for-word alignment information to the phrase table. Unfortunately this information get's lost when converting the textual phrase-table into a binary format with processPhraseTable. Using processPhraseTable -alignment-info was meant to store the alignment information in the binary table as well. This functionality is broken since the format for the word alignment information changed and currently no word alignment information is stored in the binary phrase tables. Being required to use the textual file limits the size of the phrase-table in respect to the memory on the server. The attached patch provides the missing changes. It stores new-style alignment information with the target candidates in the phrase-table.binphr.tgtdata.wa file and reads them out correspondingly (It doesn't split the alignment information into source and target alignment as in the old implementation/format. It keeps it in a format supported by TargetPhrase::SetAlignmentInfo(std::string)). I tested the change with valgrind for both moses and processPhraseTable in a smaller moses translation system without any complaints. And both the translation and the alignment file that gets produced with moses -use-alignment-info -print-alignment-info -T File are identical, regardless of text or binary phrase-table. The patch should not change the behavior for phrase-tables without word-alignment. I hope you find the patch useful and hopefully it can be committed to repo. Of course, please let me know if any modifications are necessary or desirable. best regards Christof diff -wcr moses-2010-09-24/misc/queryPhraseTable.cpp moses-2010-09-24.svn/misc/queryPhraseTable.cpp *** moses-2010-09-24/misc/queryPhraseTable.cpp 2010-10-20 18:04:04.0 -0700 --- moses-2010-09-24.svn/misc/queryPhraseTable.cpp 2010-09-24 12:57:04.0 -0700 *** *** 46,55 srcphrase = Moses::Tokenizestd::string(line); std::vectorMoses::StringTgtCand tgtcands; ! std::vectorstd::string wordAlignment; if(useAlignments) ! ptree.GetTargetCandidates(srcphrase, tgtcands, wordAlignment); else ptree.GetTargetCandidates(srcphrase, tgtcands); --- 46,55 srcphrase = Moses::Tokenizestd::string(line); std::vectorMoses::StringTgtCand tgtcands; ! std::vectorMoses::StringWordAlignmentCand src_wa, tgt_wa; if(useAlignments) ! ptree.GetTargetCandidates(srcphrase, tgtcands, src_wa, tgt_wa); else ptree.GetTargetCandidates(srcphrase, tgtcands); *** *** 60,66 std::cout |||; if(useAlignments) { ! std::cout wordAlignment[i] |||; } for(uint j = 0; j tgtcands[i].second.size(); j++) --- 60,78 std::cout |||; if(useAlignments) { ! for(uint j = 0; j src_wa[i].second.size(); j++) ! if(src_wa[i].second[j] == -1) ! std::cout (); ! else ! std::cout ( src_wa[i].second[j] ); ! std::cout |||; ! ! for(uint j = 0; j tgt_wa[i].second.size(); j++) ! if(tgt_wa[i].second[j] == -1) ! std::cout (); ! else ! std::cout ( tgt_wa[i].second[j] ); ! std::cout |||; } for(uint j = 0; j tgtcands[i].second.size(); j++) diff -wcr moses-2010-09-24/moses/src/PDTAimp.h moses-2010-09-24.svn/moses/src/PDTAimp.h *** moses-2010-09-24/moses/src/PDTAimp.h2010-10-20 17:58:53.0 -0700 --- moses-2010-09-24.svn/moses/src/PDTAimp.h2010-09-24 12:57:04.0 -0700 *** *** 160,167 // get target phrases in string representation std::vectorStringTgtCand cands; ! std::vectorstd::string wacands; ! m_dict-GetTargetCandidates(srcString,cands,wacands); if(cands.empty()) { return 0; --- 160,169 // get target phrases in string representation std::vectorStringTgtCand cands; ! std::vectorStringWordAlignmentCand swacands; ! std::vectorStringWordAlignmentCand twacands; ! //
[Moses-support] KenLM distributed with Moses
Hi, I saw that KenLM source code is distributed from the Moses svn and can set in configure. Is anybody here using it and willing to share some experiences? Is it thread-safe and can used in Moses together with SRI and IRST ? Any particular advantages? Is there any more information than just the README? any hints are very welcome Christof ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Printing alignment information
Hello Souhir, are you using a recent revision of moses? My phrase tables look a bit different from yours. To see the alignment information I use the switches: -use-alignment-info -print-alignment-info -T file and the alignment information is written to the file. best regards Christof On 10/6/10 7:23 AM, Souhir Gahbiche wrote: Hi all, I'd like to save alignment information when decoding with moses in the log file. I called moses with the -use-alignment-info and the -print-alignment-info options but still don't have any alignments information. My phrase table looks like : ! hAyty . ||| ! Haïti » . ||| (0) (1) (2) (3) ||| (0) (1) (2) (3) ||| 1 0.187346 1 0.0661179 2.718 Is it the wrong parameters to get the alignments information? Regards ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] filter-pt doesn't work?
Hi, it seems scores and extra have changed the location in the phrase table. The attached patch got me a lot further along, I changed the order in the output as well. Not sure if if (print_cooc_counts) os ||| pp.cfe pp.cf pp.ce; if (print_neglog_significance) os ||| pp.nlog_pte; still prints things in the correct order (pt-filter.cpp lines 144 - 145. best regards Christof On 9/24/10 1:37 PM, Christof Pintaske wrote: Hi, I just updated my moses installation to trunk. Unfortunately I found that filter-pt is now crashing instead of pruning. The patch below fixed the bleeding for me. However even with that patch I receive plenty of error messages: No occurrences found and the pruned table is just too small to be true. Does filter-pt get out of step because the phrase table has now 5 records instead of 3 (my old installation is from June). filter-pt.cpp seems to be completely unchanged compared to the June installation. Any hints or fixes are welcome. best regards Christof diff -wc sigtest-filter/filter-pt.cpp ../moses-2010-06-04/sigtest-filter/filter-pt.cpp *** sigtest-filter/filter-pt.cpp2010-09-24 13:19:34.0 -0700 --- ../moses-2010-06-04/sigtest-filter/filter-pt.cpp2010-06-04 15:33:39.0 -0700 *** *** 103,111 } } } - if (i != scores.end()) { ++i; - } char f[24]; char *fp=f; while (i != scores.end() *i != ' ') { ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support *** filter-pt.cpp 2010-09-24 14:45:19.0 -0700 --- /export/home/moses/src/moses-2010-06-04/sigtest-filter/filter-pt.cpp 2010-06-04 15:33:39.0 -0700 *** *** 87,105 { size_t pos = 0; std::string::size_type nextPos = str.find(SEPARATOR, pos); ! this-f_phrase = str.substr(pos,nextPos); ! ! pos = nextPos + SEPARATOR.size(); ! nextPos = str.find(SEPARATOR, pos); ! this-e_phrase = str.substr(pos,nextPos-pos); ! ! pos = nextPos + SEPARATOR.size(); nextPos = str.find(SEPARATOR, pos); ! this-scores = str.substr(pos,nextPos-pos); ! ! pos = nextPos + SEPARATOR.size(); ! this-extra = str.substr(pos); ! int c = 0; std::string::iterator i=scores.begin(); if (index 0) { --- 87,98 { size_t pos = 0; std::string::size_type nextPos = str.find(SEPARATOR, pos); ! this-f_phrase = str.substr(pos,nextPos); pos = nextPos + SEPARATOR.size(); nextPos = str.find(SEPARATOR, pos); ! this-e_phrase = str.substr(pos,nextPos-pos); pos = nextPos + SEPARATOR.size(); ! nextPos = str.rfind(SEPARATOR); ! this-extra = str.substr(pos, ((nextPos pos)?(nextPos-pos):0)); ! this-scores = str.substr(nextPos + SEPARATOR.size(),std::string::npos); int c = 0; std::string::iterator i=scores.begin(); if (index 0) { *** *** 110,118 } } } - if (i != scores.end()) { ++i; - } char f[24]; char *fp=f; while (i != scores.end() *i != ' ') { --- 103,109 *** *** 139,146 std::ostream operator (std::ostream os, const PTEntry pp) { os pp.f_phrase ||| pp.e_phrase; - os ||| pp.scores; if (pp.extra.size()0) os ||| pp.extra; if (print_cooc_counts) os ||| pp.cfe pp.cf pp.ce; if (print_neglog_significance) os ||| pp.nlog_pte; return os; --- 130,137 std::ostream operator (std::ostream os, const PTEntry pp) { os pp.f_phrase ||| pp.e_phrase; if (pp.extra.size()0) os ||| pp.extra; + os ||| pp.scores; if (print_cooc_counts) os ||| pp.cfe pp.cf pp.ce; if (print_neglog_significance) os ||| pp.nlog_pte; return os; ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Use of unfactored training data set in moses???
where did you get your Moses installation from? train-model.perl should be in moses/bin/scripts-*/training . At least it has always been there for me. If that's not the case then you may want to check the source or checkout a new version from the repository. hope that helps Christof Dear All, I am doing a research on the development of statistical translation system for Sri Lankan local languages (Sinhala and Tamil). The available corpus is unfactored and it was created by me.. So, I would like to know whether the script found in moses-scripts/scripts-timestamp/training/train-factored-phrase-model.perl is suitable for the training. The manual on Moses use specifies a script called train-model.perl for unfactored model training, which I was unable to locate. Expecting your help as soon as possible. I am really thanking your help to solve the above issue. Thank you. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] What is the use of the lm parameter in the model training stage?
On 5/20/10 8:12 PM, yifeng...@sina.com wrote: In Factored Tutorial, the first example is: % train-model.perl \ --corpus factored-corpus/proj-syndicate \ --root-dir unfactored \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 I think the language model is usually used in the decoding stage in SMT. What is the use of the lm parameter which lists a language model in the model training stage? I'm not sure if it's really required, but it's written to the moses.ini, which you later need in decoding. Otherwise you'd have to patch the moses.ini manually. just my 2 cents of wisdom Christof ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Build Moses for translating English to Chinese.
Hi, you may want to have a closer look at tokenizer.perl which is used for word-breaking. It seems there is some special logic to handle English, French, and Italian but nothing much else. I'm not sure if you can or plan to reveal your findings here on the list but at any rate I'd be very interested to learn how Chinese worked for you. best regards Christof nati g wrote: Hello, Do we need any special scripts to build moses for translating english to chinese. thanks in advance. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] parse-de-bitpar.perl peculiarities
Hi, in parse-de-bitpar.perl the code sequence while(STDIN) { foreach (split) { s/\(/\*LRB\*/g; s/\)/\*RRB\*/g; print TMP $_.\n; } print TMP \n; } adds a newline after each single word. Is this required? To me it looks like bitpar parses sentences on a single line just fine. I'm asking because this behavior causes trouble with my data down the line: Annotating my English (source language) corpus with bitpar (while keeping the French target corpus plain) adds empty lines to the annotated English source. This brings source and target file out of sync. The root cause seems to be that internally parse-de-bitpar.perl adds a newline after each word before feeding it to bitpar. In addition iconv may eliminate certain characters which lead to empty lines that are eventually interpreted as a sentence break. An (admittedly very ugly) segment like: you have been invited to community , collection1 by user1 ” , “ message from ” , and “ please use the following url to access the community . gets parsed without any obvious error by bitpar when I feed it directly, or even after being filtered initially through iconv. However within parse-de-bitpar.perl it gets first converted into: you have been invited to community , collection1 by user1 , message [...] Which bitpar parses into 5 sentences (TOP (X/domV (NP/base (CD \))(SBAR/0 (-NONE-(0))(S/fin (NP-SBJ/n3s/base+\#?NPSBJ? (PRP/n3s you) [...] No parse for: , No parse for: message from No parse for: , and (TOP (S/fin/. (NP-SBJ/n3s/base+\#?NPSBJ? (NN please))(VP/n3s_?NPSBJ? (VVP/nst use)(NP/base (DT/the the)(JJ following)[...] parse-de-bitpar.perl changes the No parse for into empty lines. Since 1 sentence gets unfolded into 5 lines, English source and the unannotated target get out of sync. any comments are welcome best regards Christof ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] moses_chart: bug in parse-de-bitpar.perl ?
Hi, I'm playing with bitpar for parsing and annotating English content. I modified parse-de-bitpar.perl to use the TraceParser grammar files instead of the German Tiger files. When I tried to annotate my corpus parse-de-bitpar.perl died on me on two occasions: 1. a grammar like (a (b (c))(d)) does not get parsed correctly. parse-de-bitpar.perl chokes on the double (or multiple) closing brackets c)) 2. quoted brackets are not parsed correctly. bitpar threw something like \\(xyz\)\ at parse-de-bitpar.perl which rang it down. I can provide exact examples if anybody is interested. The patch below did it for me. Does anybody have experiences to share regarding syntax annotation? Is collins the way to go for English? best regards Christof diff -w local/bin/parse-bitpar.perl ~/libexec/moses-chart/bin/scripts/training/wrappers/parse-de-bitpar.perl 61c55 my ($label,$rest) = split(/(?!\\)[\)\( ]/,substr($line,$i+1)); --- my ($label,$rest) = split(/[\( ]/,substr($line,$i+1)); ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] moses_chart: usage information for phrase_extract
Hi, here's a minor discovery in phrase_extract phrase_extract does not give any usage information, even though it seems somebody had the intention to do so: if (argc 1) { cerr syntax: relax-parse in-parse out-parse [ --LeftBinarize | ---RightBinarize | --SAMT 1-4 ] endl; exit(1); } argc is of course always 1 or greater. It would be great if phrase_extract would support something like --help or -h. Or maybe require at least one argument and then provides usage on if (argc 2) of course this is just minor stuff. best regards Christof ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] moses_chart: tuning with mert-moses-new.pl doesn't change the moses.ini
Hieu Hoang wrote: hi christof the mert-moses-new hasn't been modified to do chart decoding yet, only the original mert-moses.pl. There's just little changes that needs to be done but noone's gotten round to doing it yet many, many, thanks, that did it! I used mert-moses.pl and it completed tuning overnight! Results (bleu scores) for my hierarchical phrase model are slightly worse than using the good old phrase based decoder. Does anybody have experiences if tagging the (English) source with an annotating parser (bitpar and collins are mentioned in the documentation) improves things? best regards Christof On 01/02/2010 19:13, Christof Pintaske wrote: Hi, while running mert-moses-new.pl to tune the chart-decoder I get this output: [...] The decoder returns the scores in this order: lm tm tm tm tm tm tm w [...] Executing: /export/home/moses/libexec/moses-chart/bin/extractor --scfile run1.scores.dat [...] The decoder produced also some 'tm' scores, but we do not know the ranges for them, no way to optimize them it seems it dies on these 'tm' scores. Is there a way to prevent the decoder from producing these scores? regards Christof Christof Pintaske wrote: Hi, I'm running mert-moses-new.pl ~/en-fr_chart/tuning/token_lowercase.en ~/en-fr_chart/tuning/token_lowercase.fr /export/home/moses/libexec/moses-chart/bin/moses_chart ~/en-fr_chart/training/pm/model/moses.ini --mertdir=/export/home/moses/libexec/moses-chart/bin --working-dir=~/en-fr_chart/tuning --decoder-flags=-v 0 --no-filter-phrase-table for tuning. It performs exactly one iteration and writes a run1.moses.ini in the tuning directory. That moses.ini is almost identical to the one that I got after training (see below). Am I missing the obvious? any hints are welcome Christof ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] moses for haitian relief
Hi Christopher, when I tried to build srilm on a amd64 machine running Linux, I found that srilm used the common/Makefile.machine.i686 Makefile. This, in turn, defines compiler flags GCC_FLAGS = -m32 -mtune=pentium3 -W I removed the -m32 -mtune=pentium from the flags to build regular 64 bit code. After that I could link the result to moses quite fine. hope that helps Christof christopher taylor wrote: hello everyone! i'm currently trying to build an instance of moses to support crisiscommons.org's machine translation project (i'm currently the PM). i really want to give moses a spin *but* i'm having issues building it. my build trouble is related to liboolm.a - here's out put from my compilation: Making all in moses-cmd/src make[2]: Entering directory `../mt/moses/moses-cmd/src' g++ -g -O2 -L..//mt/srilm/lib/i686 -L..//mt/irstlm//lib/x86_64 -o moses Main.o mbr.o IOWrapper.o TranslationAnalysis.o -L../../moses/src -lmoses -loolm -ldstruct -lmisc -lirstlm -lz /usr/bin/ld: skipping incompatible ../mt/srilm/lib/i686/liboolm.a when searching for -loolm /usr/bin/ld: cannot find -loolm collect2: ld returned 1 exit status make[2]: *** [moses] Error 1 make[2]: Leaving directory `..//mt/moses/moses-cmd/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `..//mt/moses' make: *** [all] Error 2 thanks so much for your help! chris taylor ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Moses server dies when loading language model
I'd think that you did not compile or link successfully against SRILM. As a consequence LanguageModelInternal tries to load the language model as opposed to being overloaded by the correct SRILM code. just my 2 cents Christof Panagiotis Kanavos wrote: Hi, I downloaded moses from svn and followed the steps to build the moses server with multithreading. I think I built it successfully, but when I run it I get this error when it starts loading the language model: mosesserver: LanguageModelInternal.cpp:22: virtual bool Moses::LanguageModelInternal::Load(const std::string, Moses::FactorType, float, size_t): Assertion `nGramOrder = 3' failed. I had already installed moses on my Ubuntu 64bit Server using Eric Nichols packages, which still runs fine. TIA ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] moses_chart and recaser/phrase-table ?
Hello Hieu, I actually used the chart decoder and scripts to train the recaser and consequently I used the chart decoder for the recaser decoding as well. It seems that the chart decoder still writes an old-style moses.ini when used with the recaser training scripts, however it's not able to read them. In Moses::StaticData::LoadPhraseTables in StaticData.cpp:832 string filePath= token[4]; it expects [ttable-file] to have at least 5 entries, hardcoded. So the recaser scripts are actually not usable with the chart decoder. You may want to add an assertion for that in the code. As a side note, I saw that the chart decoder always links to libpthread, regardless of --enable-threads. Is the chart decoder multithreaded? and can it still be used with the irstlm language model? best regards Christof Hieu Hoang wrote: hi christof there's small fiddly changes to the ini file format so the chart decoder isn't backwardly compatible. You should use the trunk decoder if the data was trained using the trunk scripts On 22/01/2010 00:20, Christof Pintaske wrote: Hi, is the moses/moses_chart executable from the mt3_chart branch supposed to be usable as a recaser (that is using a phrase-table) ? When I do so then I get a SEGV (see stacktrace at the very end). Working on the same files with moses from svn/trunk works fine. I read somewhere that irstlm is not thread-safe and in the debug output I see that a new thread has been started. Is that the problem? I thought I'd be safe because I did *not* configure moses with --enable-threads nor with boost. many thanks Christof (gdb) run -f ~/data/engine/en-fr_chart/recaser/moses.ini x Starting program: /export/home/moses/libexec/moses-chart/bin/moses -f ~/data/engine/en-fr_chart/recaser/moses.ini x [Thread debugging using libthread_db enabled] Defined parameters (per moses.ini or switch): config: /export/home/moses/data/engine/en-fr_chart/recaser/moses.ini distortion-limit: 6 input-factors: 0 lmodel-file: 1 0 3 /export/home/moses/data/engine/en-fr_chart/recaser/cased.irstlm.gz mapping: 0 T 0 ttable-file: 0 0 5 /export/home/moses/data/engine/en-fr_chart/recaser/phrase-table.gz ttable-limit: 20 weight-d: 0.6 weight-l: 0.5000 weight-t: 0.2 0.2 0.2 0.2 0.2 weight-w: -1 Added 0 Distortion 0-0 Added 1 !UnknownWordPenalty 1-1 Added 2 WordPenalty 2-2 Loading lexical distortion models... have 0 models Start loading LanguageModel /export/home/moses/data/engine/en-fr_chart/recaser/cased.irstlm.gz : [0.000] seconds Added 3 LanguageModel 3-3 In LanguageModelIRST::Load: nGramOrder = 3 Loading LM file (no MAP) iARPA loadtxt() 1-grams: reading 8178 entries 2-grams: reading 37042 entries Detaching after fork from child process 4881. 3-grams: reading 61742 entries Detaching after fork from child process 4882. done OOV code is 8177 OOV code is 8177 IRST: m_unknownId=8177 Finished loading LanguageModels : [1.000] seconds [New Thread 0x2b0788eaa940 (LWP 4878)] Program received signal SIGSEGV, Segmentation fault. 0x00398a29c8c8 in std::basic_stringchar, std::char_traitschar, std::allocatorchar ::basic_string () from /usr/lib64/libstdc++.so.6 (gdb) where #0 0x00398a29c8c8 in std::basic_stringchar, std::char_traitschar, std::allocatorchar ::basic_string () from /usr/lib64/libstdc++.so.6 #1 0x0045ca7a in Moses::StaticData::LoadPhraseTables (this=0x74c8e0) at StaticData.cpp:832 #2 0x004646ea in Moses::StaticData::LoadData (this=0x74c8e0, parameter=value optimized out) at StaticData.cpp:406 #3 0x00406fd1 in main (argc=3, argv=0x7fffeccfc5d8) at ../../moses/src/StaticData.h:217 ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] moses-chart: buggy line in extract.o.sorted
Hi, my extract.o.gz respectively extract.o.sorted produce a large number of error messages: buggy line. For example: buggy line (o_previous:): [X] , [X] and ||| [X] , [X] et ||| buggy line (o_following:): [X] , [X] and ||| [X] , [X] et ||| in fact extract has generated the respective line(s) in extract.o without mono, other or swap attribute which seem to trigger these complaints. For example: [X] interface ( cli ||| [X] de l' interface ||| other swap [X] interface [X] ||| [X] de [X] interface ||| [X] [X] cli ||| [X] de l' [X] ||| [X] interface ( cli ) ||| [X] de l' interface ||| other swap [X] interface ( [X] ||| [X] de [X] interface ||| [X] interface [X] ) ||| [X] de [X] interface ||| [X] [X] cli ) ||| [X] de l' [X] ||| is this something I can ignore? I'd really like to understand what's going on here :-) many thanks and best regards Christof ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] moses-chart: buggy line in extract.o.sorted
Philipp Koehn wrote: Hi, it seems you are using hierarchical rules and lexicalized reordering at the same time. This is asking for trouble... oops, I had blindly carried over all the commandline arguments from my phrase based training. Now it works thanks a lot Christof -phi On Wed, Jan 20, 2010 at 11:05 PM, Christof Pintaske christof.pinta...@sun.com wrote: Hi, my extract.o.gz respectively extract.o.sorted produce a large number of error messages: buggy line. For example: buggy line (o_previous:): [X] , [X] and ||| [X] , [X] et ||| buggy line (o_following:): [X] , [X] and ||| [X] , [X] et ||| in fact extract has generated the respective line(s) in extract.o without mono, other or swap attribute which seem to trigger these complaints. For example: [X] interface ( cli ||| [X] de l' interface ||| other swap [X] interface [X] ||| [X] de [X] interface ||| [X] [X] cli ||| [X] de l' [X] ||| [X] interface ( cli ) ||| [X] de l' interface ||| other swap [X] interface ( [X] ||| [X] de [X] interface ||| [X] interface [X] ) ||| [X] de [X] interface ||| [X] [X] cli ) ||| [X] de l' [X] ||| is this something I can ignore? I'd really like to understand what's going on here :-) many thanks and best regards Christof ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] prune-lm in endless loop
Hi, just for the records, recompiling prune-lm with additionally setting the compiler option -fno-strict-aliasing solved the problem. It seems gcc 4.1.2 didn't like the magic casting that's used in some of the source files. Is there a place where these kind of issues are documented? best regards Christof Christof Pintaske wrote: Hi, I created a 3-gram LM with the irstlm toolkit (5.0.22). The LM has about 25M entries: ngram 1= 300209 ngram 2= 4864097 ngram 3= 20336549 I tried to prune it with prune-lm on a Linux machine. prune-lm --threshold=1e-6,1e-6 sun.irstlm.gz sun.pruned.irlstlm x.out In the out x.out I get repeated error messages ng: qu0 ts=1.00059 tbs=0.0196106 k=0 ns=20 probably more than 100M identical ones. After running the pruning over night the stderr output reached 100GB size and I stopped the process. Just looking at the source code I assume that lmtable::wdprune() loops endless over the prune: goto statement. Are there any problems with the pscale() routine? Any hints where to look at are highly appreciated. best regards Christof ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support