basically a decoder does a beam search through a vast set of options, guided by various probabilities
(those tables I mentioned are mainly language models) it would be vey interesting indeed to really open-up the decoder in this manner: get MR to do a parallel search over the search graph. my initial guess is that this would be too slow, since the actual decoding can be done in-memory on a typical Dell box (eg in 32G). this is per-sentence; per document is a different matter. what i had in mind was something less radical: take a document, split it into shards; translate each shard (as a mapper) and have the reducer recombine the translated sentences. Hypertable would be used to serve the language models and phrase tables (string-string mappings, specifying the set of possible ways some source fragment gets translated into target fragments). language models are interesting in that the vast majority of requests won't be for a valid ngram (the decoder searches over a space of possible translations, most of which are garbage). so, possible speed-ups would be to have a small memory on-board cache of the table (as a Bloom Filter). more interesting still would be to consider the stream of language model requests and see whether it itself can be modelled (think of this as say Bernoulli trials: do i get a hit or not). if there is structure in this stream, then i could design an algorithm to learn when it is in a sequence of misses (exploring some low probabily space of translations) or whether it switched to say a high probability region. Miles 2009/7/18 Ted Dunning <[email protected]> > On Sat, Jul 18, 2009 at 1:30 AM, Miles Osborne <[email protected]> wrote: > > > i wish people would consider more often using Hypertable / Hbase etc in > > algorithms: there are times when you want random access and as a > > community, > > i think we need to put more effort into working-out good ways to use all > > the > > options available. > > > > I think that this is very sage advice. I would actually add Lucene to this > list. By doing an implicit matrix multiple and sparsification, it provides > an extraordinarily useful primitive operation. > > > > currently as a background task I'm thinking how to > > Hadoop-ify our Machine Translation system; this involves random access > to > > big tables of string-value pairs, as well as a few other tables. > > compressed, these can be 20G each and we need to hit these tables 100s of > > thousands of times per sentence we translate. > > > > For translating a single sentence, this is a very plausible design option. > > > > so, the reseach questions > > here then become how to (a) modify the Machine Translation decoding > > procedure to best batch-up table requests --Google have published on > this-- > > and more interestingly, try to be more clever about whether a network > > request actually needs to be made. > > > > And this problem is also very interesting if you consider how these > operations can be interleaved and reordered if you are translating > thousands > of sentences at a time. Can you rephrase the problem so that the map takes > a single sentence and emits all of the queries it would like to do for that > sentence. The map would also inject all of the tables that you want to > query. Then reduce can group by table key "perform" the lookups for each > key value. Then a second map reduce would reorganize the results back to > the sentence level. > > If your tables exceed memory, or if the sort of many queries is faster > asymptotically than queries, this could be much faster. > > > > > > -- > Ted Dunning, CTO > DeepDyve > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
