I agree and would like to.
But this is tricky, look at the first 30 lines of my phrase table below.
and this happens a lot in the first line of tables where there are &apos
or weird codes, EN/FR pairs do not match.
! ! ! ! ||| ! ! ! ! ||| 0.103413 0.132185 0.103413 0.401758 ||| 0-0 1-1
2-2 3-3 ||| 1 1 1 ||| |||
! ! ! ) ||| ! ! ! ) ||| 0.339323 0.167884 0.508985 0.4246 ||| 0-0 1-0
2-0 2-1 2-2 3-3 ||| 3 2 2 ||| |||
! ! ! ||| ! ! ! ||| 0.501834 0.219223 0.716905 0.50463 ||| 0-0 1-1 2-2
||| 10 7 6 ||| |||
! ! ! ||| budget ! ! ! ||| 0.0517067 0.219223 0.0147733 4.50635e-05 |||
0-1 1-2 2-3 ||| 2 7 1 ||| |||
! ! ) , ||| ! ! ) - , ||| 0.103413 0.111989 0.103413 0.00192967 ||| 0-0
1-1 2-2 3-3 3-4 ||| 1 1 1 ||| |||
! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321 ||| 0-0 1-1 2-2
||| 1 1 1 ||| |||
! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0 1-1 ||| 16 13
10 ||| |||
! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779 ||| 0-0 1-0
||| 2.21954e+06 13 1 ||| |||
! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487 5.66022e-05 ||| 0-1
1-2 ||| 2 13 1 ||| |||
! ! ||| nécessaire ! ! ||| 0.103413 0.363573 0.00795487 0.000130572 |||
0-1 1-2 ||| 1 13 1 ||| |||
! [ never again ! ||| ! ||| 6.51628e-06 5.42074e-13 0.103413
0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| |||
! ] this is ||| tel est ||| 7.38667e-05 9.16191e-11 0.103413
0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| |||
! ] this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413 0.0035893 |||
2-0 ||| 9436 1 1 ||| |||
! ] ||| ! ] ||| 0.103413 0.352335 0.103413 0.472387 ||| 0-0 1-1
||| 1 1 1 ||| |||
! & quot ; ||| ! " . et ||| 0.0517067 2.36396e-12 0.0517067
1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| |||
! & quot ; ||| ! " ||| 0.000222394 1.44515e-11 0.0517067
0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| |||
! & quot ||| ! " . ||| 0.000662906 8.30626e-09 0.0344711
0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| |||
! & quot ||| ! " ||| 0.00218918 8.30626e-09 0.339323 0.518419
||| 0-0 2-1 ||| 465 3 2 ||| |||
! & ||| ! ||| 6.51628e-06 7.21755e-05 0.103413 0.796143 ||| 0-0 |||
15870 1 1 ||| |||
! ' ] , addressed ||| ! " adressé ||| 0.103413 3.70838e-07
0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| |||
! ' ] , ||| ! " ||| 0.000222394 2.49698e-06 0.103413
0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| |||
! ' ] ||| ! " ||| 0.000222394 3.57128e-05 0.103413
0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| |||
! ' ' Alstom shares ||| l' on constate un
dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413 1.03361e-14 ||| 1-0
2-0 1-1 3-4 4-4 ||| 3 1 1 ||| |||
! ' ' ||| l' on constate un ||| 0.0147733 1.56906e-11
0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| |||
! ' ' ||| l' on constate ||| 0.000984889 1.56906e-11
0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| |||
! ' ' ||| l' on ||| 6.76656e-06 1.56906e-11 0.0129267
6.18613e-06 ||| 1-0 2-0 1-1 ||| 15283 8 1 ||| |||
! ' ' ||| ou que l' on constate ||| 0.0344711 1.56906e-11
0.0129267 4.69534e-15 ||| 1-2 2-2 1-3 ||| 3 8 1 ||| |||
! ' ' ||| ou que l' on ||| 0.00304157 1.56906e-11
0.0129267 1.22594e-10 ||| 1-2 2-2 1-3 ||| 34 8 1 ||| |||
! ' ' ||| que l' on constate un ||| 0.0344711 1.56906e-11
0.0129267 4.56092e-14 ||| 1-1 2-1 1-2 ||| 3 8 1 ||| |||
! ' ' ||| que l' on constate ||| 0.00323167 1.56906e-11
0.0129267 4.74661e-12 ||| 1-1 2-1 1-2 ||| 32 8 1 ||| |||
Le 23/09/2015 15:12, Tom Hoar a écrit :
Vincent,
If you suspect bad entries, isn't it better to address the root of the
problem and prepare your training corpus better?
On 9/23/2015 6:46 PM, [email protected] wrote:
Date: Tue, 22 Sep 2015 20:24:02 +0200
From: Philipp Koehn<[email protected]>
Subject: Re: [Moses-support] is there a way to remove a bad entry in
the phrase table ?
To: Vincent Nguyen<[email protected]>
Cc: moses-support<[email protected]>
Hi,
you can remove it manually (just edit the text file), there will be no
negative consequences.
However, it is not a realistic strategy to try to remove by hand every
offending phrase table entry.
-phi
On Tue, Sep 22, 2015 at 4:05 PM, Vincent Nguyen<[email protected]> wrote:
>Hi,
>
>I was wondering if after an analysis of the BLEU-Annotation file we
>realize that there must be a bad entry in the phrase table,
>we could remove it manually or in some other ways ?
>
>Gracias.
>V.
>_______________________________________________
>Moses-support mailing list
>[email protected]
>http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
Best regards,
Tom Hoar
Chief Executive Officer
/*Precision Translation Tools Pte Ltd*/
Singapore/Thailand
Web: www.precisiontranslationtools.com
<http://www.precisiontranslationtools.com>
Thailand Mobile: +66 87 345-1875
Skype: tahoar
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support