Thanks Ken for all your feedback, 

One more question. I'm using moses with boost. I uncommented the line
#define USE_BOOST in kenlm/util/string_piece.hh and recompiled Moses
without problems. 

Then, I uncommented #define USE_ICU and ./configure fails with the error
log below. libicu-dev and libicu42 are is loaded on my system. Also, each
compile started with a clean moses download.

Is USE_ICU usable or necessary with Moses? 

Thanks,
Tom


configure: Using Boost library
checking for boostlib >= 1.36.0... yes
configure: Building threaded moses
checking whether the Boost::Thread library is available... yes
checking for exit in -lboost_thread-mt... yes
checking Ngram.h usability... yes
checking Ngram.h presence... yes
checking for Ngram.h... yes
checking for trigram_init in -loolm... yes
checking n_gram.h usability... yes
checking n_gram.h presence... yes
checking for n_gram.h... yes
checking lm/ngram.hh usability... no
checking lm/ngram.hh presence... yes
checking for lm/ngram.hh... no
configure: WARNING: lm/ngram.hh: present but cannot be compiled
configure: WARNING: lm/ngram.hh:     check for missing prerequisite
headers?
configure: WARNING: lm/ngram.hh: see the Autoconf documentation
configure: WARNING: lm/ngram.hh:     section "Present But Cannot Be
Compiled"
configure: WARNING: lm/ngram.hh: proceeding with the compiler's result
configure: error: Cannot find KEN-LM in yes




On Tue, 26 Oct 2010 12:48:13 -0400, Kenneth Heafield <mo...@kheafield.com>
wrote:
> Yes, I require <s> and </s> to appear in your ARPA.  These tags are
> important from an output quality perspective (BLEU etc).  I'll put that
> in the documentation when I get around to writing it, but personally
> think IRST should include them by default.
> 
> Kenneth
> 
> On 10/26/10 12:30, supp...@precisiontranslationtools.com wrote:
>> Thanks Ken. I tested it and it works. 
>> 
>> FYI, on my first attempt there was a different error. Something about
the
>> <s> token (word?) was missing. I added the <s></s> tags and re-ran
>> irstlm's
>> build-lm.sh script with option -b (Include sentence boundary n-grams)
and
>> the error disappeared.
>> 
>> It's pretty fast now. I look forward to testing the optimized code.
>> 
>> Tom
>> 
>> 
>> 
>> On Tue, 26 Oct 2010 10:18:17 -0400, Kenneth Heafield
>> <mo...@kheafield.com>
>> wrote:
>>> I've fixed this in revision 3657 and tested that it works with a toy
>>> IRSTLM example.
>>>
>>> Sorry about that,
>>>
>>> Kenneth
>>>
>>> P.S. a faster version is under code review and coming soon.
>>>
>>> On 10/26/10 03:57, Nicola Bertoldi wrote:
>>>> the empty line after each ngram-block is not mandatory in the ARPA
>> format
>>>> (see
>>>>
http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html)
>>>> and IRSTLM does not produce it.
>>>>
>>>>
>>>> best regards,
>>>> Nicola Bertoldi
>>>>
>>>> On Oct 26, 2010, at 9:42 AM, <supp...@precisiontranslationtools.com>
>>>> <supp...@precisiontranslationtools.com> wrote:
>>>>
>>>>> Hi Ken,
>>>>>
>>>>> I'm created an iARPA file with IRSTLM using the options -n 3 (2
>>>>> grams), -b
>>>>> (include the <s> sentence boundary) and -d (subdictionary for
ngrams).
>>>>> Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA
>>>>> format.
>>>>>
>>>>> Finally, I ran build_binary to binarize the ARPA format for KenLM. I
>> got
>>>>> the following error:
>>>>>
>>>>> $ build_binary arpa.en.lm arpa.en.binary
>>>>> Reading lm.en.lm
>>>>>
>>
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>>>>
>>>>> terminate called after throwing an instance of
>> 'lm::FormatLoadException'
>>>>>   what():  Expected blank line after 3-grams at byte 22348989 in
file
>>>>> arpa.en.lm
>>>>> Aborted
>>>>>
>>>>> What am I missing?
>>>>>
>>>>> Thanks,
>>>>> Tom
>>>>>
>>>>>
>>>>> On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield
>>>>> <mo...@kheafield.com>
>>>>> wrote:
>>>>>> KenLM is inference-only.  It cannot create ARPA files.  So you'll
>> need
>>>>>> to use your favorite toolkit to generate the ARPA.
>>>>>>
>>>>>> On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
>>>>>>> Thanks Ken. Nice work.
>>>>>>>
>>>>>>> Is there a way to train the ARPA formatted LM with KenLM, or do we
>>>>>>> need
>>>>>>> to
>>>>>>> train with another tool, like SRILM or convert IRSTLM to full ARPA
>>>>>>> format?
>>>>>>>
>>>>>>> Thanks again,
>>>>>>> Tom
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
>>>>>>> <mo...@kheafield.com>
>>>>>>> wrote:
>>>>>>>> Hi Moses,
>>>>>>>>
>>>>>>>>     Introducing kenlm in Moses trunk.  You no longer need to
>>>>>>>> download a
>>>>>>>> separate language model to use Moses; it's distributed with Moses
>> and
>>>>>>>> compiled in by default on UNIX.  This is threadsafe language
model
>>>>>>>> inference code that returns the same probabilities as SRI (up to
>>>>>>>> floating point rounding).  It loads APRA files in 2/3 the time
SRI
>>>>> takes
>>>>>>>> and uses less memory too.  Using kenlm is simple: in your
>>>>> [lmodel-file]
>>>>>>>> section, change the first digit to 8.  For example,
>>>>>>>>
>>>>>>>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa"
>>>>>>>>
>>>>>>>>     For even faster loading, use the binary format:
>>>>>>>>
>>>>>>>> kenlm/build_binary foo.arpa foo.binary
>>>>>>>>
>>>>>>>> then simply provide the binary filename in your moses.ini e.g.
>>>>>>>> "8 0 2 foo.binary"; it auto detects binary files using magic
bytes
>> at
>>>>>>>> the beginning.
>>>>>>>>
>>>>>>>>     The code is ready for use and provides correct results. 
>>>>>>>> Inference is
>>>>>>>> slower than it should be due to inefficiencies in the Moses-side
>>>>> wrapper
>>>>>>>> code (it does a vocab lookup for all 5 words every time).  I'm
>>>>>>>> working
>>>>>>>> on it and once this is done I'll post some benchmarks against SRI
>> and
>>>>>>>> IRST. The binary format is subject to change, but contains a
>> version
>>>>>>>> number so on very rare occasions after, new versions will tell
you
>> to
>>>>>>>> rebuild your binary files.  Windows is currently not supported
(it
>>>>> uses
>>>>>>>> mmap) though I welcome contributions using #ifdef and
>>>>> CreateFileMapping.
>>>>>>>>
>>>>>>>>     Have fun and let me know about your experiences with it.
>>>>>>>>
>>>>>>>> "Ken"
>>>>>>>> _______________________________________________
>>>>>>>> Moses-support mailing list
>>>>>>>> Moses-support@mit.edu
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to