I think I got your meaning. The phrase table just creates a relationship 
between source and target. It doesn’t have an impact on the translation quality.

Actually, the corpus data determines the translation quality entirely. Due to 
all of corpus data may be from various channels, it’s critical to have a tool 
can control the quality of these data. I have checked the resource on Moses 
website, found nothing related with this topic.

How can I improve the quality of training data? Especially, when the data size 
is very big, and it’s impossible to be done manually.
Any suggestion is welcome.

Thanks,
Jun


From: Tom Hoar [mailto:[email protected]]
Sent: Friday, August 03, 2012 10:08 AM
To: Tan, Jun
Cc: [email protected]
Subject: Re: [Moses-support] help with filtering the noise enties in 
phrase-table.


All entries in a phrase table are "meaningless". There are merely a mappings 
with probabilities (et al) between groupings of tokens in your your source & 
target data. I suggest better corpus preparation and management is a better way 
to manage the content in the tables rather than directly editing them.

Tom



On Thu, 2 Aug 2012 21:57:50 -0400, <[email protected]<mailto:[email protected]>> 
wrote:

Hi all,

I'm using Moses as the decoder. After the creation of translation model, i 
checked with the phrase-table. There are a lot of meaningless entries, looks l 
like below:

Can i remove these entries from the phrase-table? Will it have impact on the 
translation quality? If so, how about i delete all the punctuations before the 
creation of the translation model?




! kadov- > 网络 ||| ! kadov- > Network ||| 1 0.341024 1 0.114004 2.718 ||| ||| 4 4
! kadov- > 网络 参数 ||| ! kadov- > Network parameters ||| 1 0.32747 1 0.0420637 
2.718 ||| ||| 2 2
! kadov- > 网络 参数 和 ||| ! kadov- > Network parameters and ||| 1 0.19872 1 
0.0395799 2.718 ||| ||| 2 2
! kadov- > 网络 参数 和 安全 ||| ! kadov- > Network parameters and security ||| 1 
0.129761 1 0.0179069 2.718 ||| ||| 2 2
! kadov- > 网络 时间 ||| ! kadov- > Network Time ||| 1 0.225683 1 0.00977698 2.718 
||| ||| 2 2
! kadov- > 网络 时间 协议 ( ||| ! kadov- > Network Time Protocol ( ||| 1 0.0968096 1 
0.00137715 2.718 ||| ||| 2 2
! kadov- > 网络 时间 协议 ||| ! kadov- > Network Time Protocol ||| 1 0.208681 1 
0.00154011 2.718 ||| ||| 2 2
! kadov- > 脱机 LUN ||| ! kadov- > Offline LUN ||| 1 0.307783 1 0.03884 2.718 ||| 
||| 1 1
! kadov- > 脱机 LUN 信息 ||| ! kadov- > Offline LUN information ||| 1 0.289121 1 
0.0252108 2.718 ||| ||| 1 1
! kadov- > 脱机 LUN 信息 。 ||| ! kadov- > Offline LUN information . ||| 1 0.280973 
1 0.0243627 2.718 ||| ||| 1 1
! kadov- > 脱机 ||| ! kadov- > Offline ||| 1 0.313173 1 0.0645311 2.718 ||| ||| 2 
2
! kadov- > 脱机 状态 ||| ! kadov- > Offline state ||| 1 0.266374 1 0.0156563 2.718 
||| ||| 1 1
! kadov- > 自动 ||| ! kadov- > Auto ||| 1 0.507327 0.2 0.0200052 2.718 ||| ||| 1 5
! " ||| ! ” ||| 0.0185185 0.00821238 1 0.00102938 2.718 ||| ||| 54 1
! # % ' * + - ||| ! # % ' * + - ||| 1 0.00122135 1 0.239671 2.718 ||| ||| 2 2
! # % ' * + ||| ! # % ' * + ||| 1 0.00255049 1 0.750455 2.718 ||| ||| 2 2
! # % ' * ||| ! # % ' * ||| 1 0.00801901 1 0.773186 2.718 ||| ||| 2 2
! # % ' ||| ! # % ' ||| 1 0.0121474 1 0.784364 2.718 ||| ||| 2 2
! # % ||| ! # % ||| 1 0.0426557 1 0.897899 2.718 ||| ||| 2 2
! # ||| ! # ||| 1 0.0550974 1 0.908339 2.718 ||| ||| 2 2
! #$ % ^&* ||| ! # $ % ^ & * ||| 1 5.14084e-07 1 0.00855668 2.718 ||| ||| 2 2
! #$ % ||| ! # $ % ||| 1 6.54489e-05 1 0.23103 2.718 ||| ||| 2 2
! #$ ||| ! # $ ||| 1 8.45389e-05 1 0.233717 2.718 ||| |


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to