Re: joshua API changes

Matt Post Wed, 25 May 2016 11:55:58 -0700

Whoops, the JOSHUA-273 (which does the API) piece wasn't pushed up. Here it is:


        https://github.com/apache/incubator-joshua/tree/JOSHUA-273

matt

> On May 25, 2016, at 2:07 PM, Matt Post <[email protected]> wrote:
> 
> Hi folks (especially Felix, Kellen, Tobi) — 
> 
> I made two moderate improvements to Joshua on the way home. The first was to 
> get rid of all the specialized phrase handling; the packer now works as we 
> discussed, packing everything into Hiero format, and the stack-based decoder 
> uses this directly now. Everything should be backwards compatible for hiero, 
> but it's not for phrase-based. I added a "version = 3" line to the packer 
> config to distinguish this, along with a check, so the decoder will throw a 
> runtime exception if you try to load something incompatible. If anything 
> fails, instead of repacking your grammar, just add the line "version = 3" to 
> the packer config. The changes only affect packing for phrase-based models, 
> so I don't think it will matter to you. This is pushed up into master.
> 
> The bigger one is on a JOSHUA-273 branch. I just pushed up a refactoring of 
> the KBestExtraction / structured translation interface, per our discussions 
> this week. However, I wasn't actually sure how to use the API. What is the 
> entry point? Are you calling translate() directly and managing your own 
> thread pools? It doesn't seem like you would be using Decoder.decode() or 
> decodeAll(), since they're not very API-ish.
> 
> If you want to take a look at the changes, I'd welcome feedback, direct 
> changes, etc. Here is a description of the major changes:
> 
> - Large refactor of the Translation output interface
> 
> - Instead of returning Translation objects, the calls to Decoder.translate() 
> now return HyperGraph objects. As before, a HyperGraph represents the 
> complete (pruned) search space the decoder explored. A HyperGraph can then be 
> operated on by KBestExtractors and by the new TranslationFactory object, so 
> that it can be thrown away.
> 
> - KBestExtractor is now an iterator that takes a HyperGraph object and 
> returns DerivationState objects, each representing a single derivation tree
> 
> - Translation and StructuredTranslation are now combined. Translation is 
> effectively a dummy object with a number of fields of interest that get 
> populated by TranslationFactory, per explicit requests. Each request returns 
> the TranslationFactory object, so you can easily chain calls, and then 
> retrieve the Translation object at the end. e.g.,
> 
>       KBestExtractor extractor = new KBestExtractor(hg, ...).
>       for (DerivationState derivation: extractor) {
>               TranslationFactory factory = new TranslationFactory(derivation, 
> ...)
>               Translation translation = factory.alignments()
>                                               
> .formattedTranslation(config.outputFormat)
>                                               .features()
>                                               .translation();
>       }
> 
> - Neither KBestExtractors nor Translation objects do any printing. This 
> improved encapsulation is a big improvement over the past. After building 
> your Translation objects, they will contain only small objects such as 
> strings, feature vectors, and alignments, that can be safely passed 
> downstream while the HyperGraph gets destroyed. Also, code for processing and 
> formatting is all now in one place, the TranslationFactory.
> 
> - Also, I removed the forest rescoring and OracleExtraction classes. These 
> are useful but not used, and are hard to read and should therefore be 
> rewritten. I will do this at some point.
> 
> There are still a few things broken on the branch, but they are small and I 
> am working to fix them. If you have a minute to poke around on the branch, 
> please do, so that the end result is what you imagined when we were chatting 
> the other day.
> 
> matt

Re: joshua API changes

Reply via email to