I think I got your meaning. The phrase table just creates a relationship between source and target. It doesn’t have an impact on the translation quality.
Actually, the corpus data determines the translation quality entirely. Due to all of corpus data may be from various channels, it’s critical to have a tool can control the quality of these data. I have checked the resource on Moses website, found nothing related with this topic. How can I improve the quality of training data? Especially, when the data size is very big, and it’s impossible to be done manually. Any suggestion is welcome. Thanks, Jun From: Tom Hoar [mailto:[email protected]] Sent: Friday, August 03, 2012 10:08 AM To: Tan, Jun Cc: [email protected] Subject: Re: [Moses-support] help with filtering the noise enties in phrase-table. All entries in a phrase table are "meaningless". There are merely a mappings with probabilities (et al) between groupings of tokens in your your source & target data. I suggest better corpus preparation and management is a better way to manage the content in the tables rather than directly editing them. Tom On Thu, 2 Aug 2012 21:57:50 -0400, <[email protected]<mailto:[email protected]>> wrote: Hi all, I'm using Moses as the decoder. After the creation of translation model, i checked with the phrase-table. There are a lot of meaningless entries, looks l like below: Can i remove these entries from the phrase-table? Will it have impact on the translation quality? If so, how about i delete all the punctuations before the creation of the translation model? ! kadov- > 网络 ||| ! kadov- > Network ||| 1 0.341024 1 0.114004 2.718 ||| ||| 4 4 ! kadov- > 网络 参数 ||| ! kadov- > Network parameters ||| 1 0.32747 1 0.0420637 2.718 ||| ||| 2 2 ! kadov- > 网络 参数 和 ||| ! kadov- > Network parameters and ||| 1 0.19872 1 0.0395799 2.718 ||| ||| 2 2 ! kadov- > 网络 参数 和 安全 ||| ! kadov- > Network parameters and security ||| 1 0.129761 1 0.0179069 2.718 ||| ||| 2 2 ! kadov- > 网络 时间 ||| ! kadov- > Network Time ||| 1 0.225683 1 0.00977698 2.718 ||| ||| 2 2 ! kadov- > 网络 时间 协议 ( ||| ! kadov- > Network Time Protocol ( ||| 1 0.0968096 1 0.00137715 2.718 ||| ||| 2 2 ! kadov- > 网络 时间 协议 ||| ! kadov- > Network Time Protocol ||| 1 0.208681 1 0.00154011 2.718 ||| ||| 2 2 ! kadov- > 脱机 LUN ||| ! kadov- > Offline LUN ||| 1 0.307783 1 0.03884 2.718 ||| ||| 1 1 ! kadov- > 脱机 LUN 信息 ||| ! kadov- > Offline LUN information ||| 1 0.289121 1 0.0252108 2.718 ||| ||| 1 1 ! kadov- > 脱机 LUN 信息 。 ||| ! kadov- > Offline LUN information . ||| 1 0.280973 1 0.0243627 2.718 ||| ||| 1 1 ! kadov- > 脱机 ||| ! kadov- > Offline ||| 1 0.313173 1 0.0645311 2.718 ||| ||| 2 2 ! kadov- > 脱机 状态 ||| ! kadov- > Offline state ||| 1 0.266374 1 0.0156563 2.718 ||| ||| 1 1 ! kadov- > 自动 ||| ! kadov- > Auto ||| 1 0.507327 0.2 0.0200052 2.718 ||| ||| 1 5 ! " ||| ! ” ||| 0.0185185 0.00821238 1 0.00102938 2.718 ||| ||| 54 1 ! # % ' * + - ||| ! # % ' * + - ||| 1 0.00122135 1 0.239671 2.718 ||| ||| 2 2 ! # % ' * + ||| ! # % ' * + ||| 1 0.00255049 1 0.750455 2.718 ||| ||| 2 2 ! # % ' * ||| ! # % ' * ||| 1 0.00801901 1 0.773186 2.718 ||| ||| 2 2 ! # % ' ||| ! # % ' ||| 1 0.0121474 1 0.784364 2.718 ||| ||| 2 2 ! # % ||| ! # % ||| 1 0.0426557 1 0.897899 2.718 ||| ||| 2 2 ! # ||| ! # ||| 1 0.0550974 1 0.908339 2.718 ||| ||| 2 2 ! #$ % ^&* ||| ! # $ % ^ & * ||| 1 5.14084e-07 1 0.00855668 2.718 ||| ||| 2 2 ! #$ % ||| ! # $ % ||| 1 6.54489e-05 1 0.23103 2.718 ||| ||| 2 2 ! #$ ||| ! # $ ||| 1 8.45389e-05 1 0.233717 2.718 ||| |
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
