Hello all,
I wonder which is the method implemented in Moses for on-demand loading
of the rule table when hierarchical phrase-based models are used. Is
this the same method used for the phrase table in phrase-based SMT, i.e.
the use of a prefix tree (trie) as describe by Zens & Ney (2007)?
Zens & Ney (2007): "Efficient Phrase Table Representation for
Machine Translation with Applications to Online MT and Speech
Translation"
In the literature I have found papers describing the use of suffix
arrays both for phrase-based SMT (Callison-Burch, Bannard & Schoeder,
2005; Zhang & Vogel 2005), and for hierarchical phrase-based SMT (Lopez,
2008; Schwartz & Callison-Burch, 2010), but all these methods store the
parallel corpus and compute the required probabilities on the fly.
Callison-Burch, Bannard & Schoeder (2005): "Scaling Phrase-Based
Statistical Machine Translation to Large Corpora and Longer
Phrases"
Zhang & Vogel (2005): "An Efficient Phrase-to-Phrase Alignment Model
for Arbitrarily Long Phrases and Large Corpora.
Lopez (2008): "Tera-Scale Translation Models via Pattern Matching"
Schwartz & Callison-Burch (2010): "Hierarchical Phrase-Based Grammar
Extraction in Joshua"
In addition, I would also like to know if Moses implements any
compression technique to save memory or disk space or if it just
identifies each word by an integer (32 bits), and which data structure
uses Moses to store the phrase table in memory.
If I am missing some work I have not cited, please let me know. I
appreciate your help.
Thank you very much in advance.
Kind regards.
--
Felipe
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support