Hi Matt,
OK seeing as I get digest emails and the next batch has not arrived I'll
just reply to the thread anyway.
Glad to see that the file downloaded fine.
You stated "Something is obviously wrong. What BLEU scores did you get on
your tuning and testing sets? Can you give me the first ten lines of
grammar.gz and lm.gz? Impossible to do sanity checks without the raw
grammars and LMs."
OK so
1) Yes something is wrong, there is no doubt about that!
2) Regards BLEU scores, I can find an individual 'bleu' file for tuning,
all I could find was the file at $rundir/tune/mert.log which states
----------------------------------------------------
Z-MERT run ended @ Thu Oct 27 15:07:36 PDT 2016
----------------------------------------------------
FINAL lambda: {0.32811579836756727, 0.13451331312861647, 3.854583349699589,
1.9529831007694043, -0.14803956975598645, 0.9677828073898326,
1.9314652655618239, 0.04035458882374297, -6.304458466023295, 1.0,
-3.9877496903616656, 0.8720746273758616} (BLEU: 0.5627727651628539)
Warning: after normalization, lambda[12]=0.8721 is outside its critical
value range.
With regards to testing, I was able to find the file at $rundir/test/bleu
which contains
Processing 5000 sentences...
Evaluating candidate translations in plain file
/usr/local/joshua_resources/russian_experiments/exp4/test/output...
BLEU_precision(1) = 94856 / 144986 = 0.6542
BLEU_precision(2) = 77801 / 139986 = 0.5558
BLEU_precision(3) = 68491 / 134986 = 0.5074
BLEU_precision(4) = 60862 / 129986 = 0.4682
BLEU_precision = 0.5421
Length of candidate corpus = 144986
Effective length of reference corpus = 143814
BLEU_BP = 1.0000
=> BLEU = 0.5421
3) Regarding the first ten lines of grammar.gz and lm.gz, here they are
below
First grammar.gz is a binary file however if I unzip it then take the top
10 I get the following
lmcgibbn@LMC-056430 ~/Desktop $ head -10 grammar
[X] ||| "application ||| "application ||| 0 0.69315 1 1.00000 0 0.69315 |||
0-0
[X] ||| "application server ||| "application server ||| 1.42471 0.74547 1
1.00000 0 0.69315 ||| 0-0 1-1
[X] ||| "application server " ||| "application server " ||| 1.49321 1.08115
1 1.00000 0 0.69315 ||| 0-0 1-1 2-2
[X] ||| "application server " mode ||| режиме "application server " |||
2.48813 1.75963 1 1.00000 0 0.69315 ||| 0-1 1-2 2-3 3-0
[X] ||| "application server " mode . ||| режиме "application server " . |||
2.50691 1.77390 1 1.00000 0 0.69315 ||| 0-1 1-2 2-3 3-0 4-4
[X] ||| "application server " mode [X,1] ||| режиме "application server "
[X,1] ||| 2.48813 1.75963 1 1.00000 0 0.69315 ||| 0-1 1-2 2-3 3-0
[X] ||| "application server " [X,1] ||| [X,1] "application server " |||
1.49321 1.08115 1 1.00000 0 0.69315 ||| 0-1 1-2 2-3
[X] ||| "application server " [X,1] . ||| [X,1] "application server " . |||
1.51199 1.09541 1 1.00000 0 0.69315 ||| 0-1 1-2 2-3 4-4
[X] ||| "application server [X,1] ||| "application server [X,1] ||| 1.42471
0.74547 1 1.00000 0 0.69315 ||| 0-0 1-1
[X] ||| "application server [X,1] mode ||| режиме "application server [X,1]
||| 2.41964 1.42395 1 1.00000 0 0.69315 ||| 0-1 1-2 3-0
If I do the same with lm.gz I get the following
lmcgibbn@LMC-056430 ~/Desktop $ head -10 lm
# Input file: fd 3
# Token count: 17545420
# Smoothing: Modified Kneser-Ney
\data\
ngram 1=632340
ngram 2=5054267
ngram 3=10057059
ngram 4=12250396
ngram 5=12726506
@Matt, if you want I can make absolutely everything I have available to you
as a tar.gz if it will aid in debugging what is going on?
Thanks
Lewis
On Wed, Nov 2, 2016 at 11:19 AM, lewis john mcgibbney <[email protected]>
wrote:
> Hi Matt,
> Thanks for looking into this.
>
> On Wed, Nov 2, 2016 at 10:49 AM, <dev-digest-help@joshua.
> incubator.apache.org> wrote:
>
>> From: Matt Post <[email protected]>
>> To: [email protected]
>> Cc:
>> Date: Tue, 1 Nov 2016 16:26:29 -0400
>> Subject: Re: Community Review of New Language Pack
>> Lewis, can I get an MD5 or SHA1 checksum? I'm getting errors unpacking.
>>
>
> Yes, please see http://home.apache.org/~lewismc/language-pack-ru-en-
> 2016-10-28.tar.gz.md5
>
>
>>
>> I do see that you built the LP with the old scripts. I'll write up
>> instructions on how to do it with the new set.
>>
>>
> Correct. I would greatly appreciate that thank you Matt.
> Lewis
>
--
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney