Whoops, the JOSHUA-273 (which does the API) piece wasn't pushed up. Here it is:
https://github.com/apache/incubator-joshua/tree/JOSHUA-273
matt
> On May 25, 2016, at 2:07 PM, Matt Post <[email protected]> wrote:
>
> Hi folks (especially Felix, Kellen, Tobi) —
>
> I made two moderate improvements to Joshua on the way home. The first was to
> get rid of all the specialized phrase handling; the packer now works as we
> discussed, packing everything into Hiero format, and the stack-based decoder
> uses this directly now. Everything should be backwards compatible for hiero,
> but it's not for phrase-based. I added a "version = 3" line to the packer
> config to distinguish this, along with a check, so the decoder will throw a
> runtime exception if you try to load something incompatible. If anything
> fails, instead of repacking your grammar, just add the line "version = 3" to
> the packer config. The changes only affect packing for phrase-based models,
> so I don't think it will matter to you. This is pushed up into master.
>
> The bigger one is on a JOSHUA-273 branch. I just pushed up a refactoring of
> the KBestExtraction / structured translation interface, per our discussions
> this week. However, I wasn't actually sure how to use the API. What is the
> entry point? Are you calling translate() directly and managing your own
> thread pools? It doesn't seem like you would be using Decoder.decode() or
> decodeAll(), since they're not very API-ish.
>
> If you want to take a look at the changes, I'd welcome feedback, direct
> changes, etc. Here is a description of the major changes:
>
> - Large refactor of the Translation output interface
>
> - Instead of returning Translation objects, the calls to Decoder.translate()
> now return HyperGraph objects. As before, a HyperGraph represents the
> complete (pruned) search space the decoder explored. A HyperGraph can then be
> operated on by KBestExtractors and by the new TranslationFactory object, so
> that it can be thrown away.
>
> - KBestExtractor is now an iterator that takes a HyperGraph object and
> returns DerivationState objects, each representing a single derivation tree
>
> - Translation and StructuredTranslation are now combined. Translation is
> effectively a dummy object with a number of fields of interest that get
> populated by TranslationFactory, per explicit requests. Each request returns
> the TranslationFactory object, so you can easily chain calls, and then
> retrieve the Translation object at the end. e.g.,
>
> KBestExtractor extractor = new KBestExtractor(hg, ...).
> for (DerivationState derivation: extractor) {
> TranslationFactory factory = new TranslationFactory(derivation,
> ...)
> Translation translation = factory.alignments()
>
> .formattedTranslation(config.outputFormat)
> .features()
> .translation();
> }
>
> - Neither KBestExtractors nor Translation objects do any printing. This
> improved encapsulation is a big improvement over the past. After building
> your Translation objects, they will contain only small objects such as
> strings, feature vectors, and alignments, that can be safely passed
> downstream while the HyperGraph gets destroyed. Also, code for processing and
> formatting is all now in one place, the TranslationFactory.
>
> - Also, I removed the forest rescoring and OracleExtraction classes. These
> are useful but not used, and are hard to read and should therefore be
> rewritten. I will do this at some point.
>
> There are still a few things broken on the branch, but they are small and I
> am working to fix them. If you have a minute to poke around on the branch,
> please do, so that the end result is what you imagined when we were chatting
> the other day.
>
> matt