Sure thing, hope to tonight. matt
> On Apr 27, 2016, at 6:41 PM, kellen sunderland <kellen.sunderl...@gmail.com> > wrote: > > Hey Matt, > > If you had time that would be fantastic. I've created a new PR in case you > want to pull it in. There's actually 4 tests failing for me currently > (casing issues causing at least one). If you want to wait until we fix > these tests that's also completely fine. > > -Kellen > > On Wed, Apr 27, 2016 at 11:32 AM, Matt Post <p...@cs.jhu.edu> wrote: > >> Do you want me to fix the recapitalization? Or are you going to do that? I >> looked a bit, and it seems I'll have to add a method to get a word >> alignment object instead of just the string, so that I can poke through >> them. This approach is as good as true-casing in some languages. >> >> A few other things: >> >> - I saw a comment in the commit about the changes not working for >> phrase-based translation. Can you (or Felix) elaborate? What exactly will >> no longer work? >> >> - Currently, there are multiple places where the "output-format" string >> has to get edited (KBestExtractor and in Translation). After you push your >> changes in, I'm going to make some edits so that this all occurs in one >> place. >> >> matt >> >> >>> On Apr 27, 2016, at 2:25 PM, kellen sunderland < >> kellen.sunderl...@gmail.com> wrote: >>> >>> Thanks for taking a look Matt, >>> >>> I think this is all we've got planned as far as changes relating to an >> API >>> would go. We have a few more commits coming but they're just performance >>> improvements and they don't change too much in the way of interfaces or >>> method signatures. >>> >>> -Kellen >>> >>> On Wed, Apr 27, 2016 at 4:47 AM, Matt Post <p...@cs.jhu.edu> wrote: >>> >>>> Kellen, >>>> >>>> Great. I had a chance to start looking over the ReworkedExtractions >>>> branch. I'll have some more time today. It looks good to me so far. Is >>>> there anything else you plan to do, or does that branch contain >> basically >>>> all of it (apart from the recapitalization fix, which I see should be >>>> applied more selectively, maybe only when a -recapitalize flag is >> present, >>>> to save on time). >>>> >>>> matt >>>> >>>> >>>>> On Apr 26, 2016, at 1:56 AM, kellen sunderland < >>>> kellen.sunderl...@gmail.com> wrote: >>>>> >>>>> Hey Matt, >>>>> >>>>> I've opened a new pull request with a few of our commits, feel free to >>>> take >>>>> a look when you have some time. >>>>> >>>>> More importantly I've pushed our queue of upcoming commits to the >>>> following >>>>> branch in my fork: >>>>> >>>> >> https://github.com/KellenSunderland/incubator-joshua/commits/ReworkedExtractions >>>>> . From there you can get an idea for the work we've done so far. I >>>>> haven't opened a PR yet for these commits because there's still some >>>>> merging I have to do (there's a few failing tests and I had to >>>> temporarily >>>>> comment out some of your casing code). Once that's fixed I'll do a >>>> proper >>>>> PR for these commits. >>>>> >>>>> -Kellen >>>>> >>>>> On Mon, Apr 25, 2016 at 1:35 PM, Matt Post <p...@cs.jhu.edu> wrote: >>>>> >>>>>> Great. On that first point, I meant that translate() would return a >>>>>> Translation object, which would know its hypergraph and could iterate >>>> over >>>>>> a KBestExtractor. In any case, though, it sounds like you are a bit >>>> ahead >>>>>> of me on this, so I'll wait for a push that I can see, and then we can >>>>>> converge on the design. >>>>>> >>>>>> matt >>>>>> >>>>>> >>>>>>> On Apr 25, 2016, at 4:10 PM, Hieber, Felix <fhie...@amazon.de> >> wrote: >>>>>>> >>>>>>> Hi Matt, >>>>>>> >>>>>>> These are some nice suggestions. Most of the work we have done is in >>>>>> line of what you propose so I would agree with Kellen that we should >>>>>> synchronize and compare better earlier than later. >>>>>>> >>>>>>> Best, >>>>>>> Felix >>>>>>> >>>>>>>> On 25.04.2016, at 07:44, kellen sunderland < >>>> kellen.sunderl...@gmail.com> >>>>>> wrote: >>>>>>>> >>>>>>>> Hey Matt, >>>>>>>> >>>>>>>> Sorry for the late reply. The Joshua-6 folder and tst may have just >>>>>> been >>>>>>>> artifacts of some symlinks I have locally. Sorry they may have been >>>>>> pushed >>>>>>>> by mistake, I can clean that up. >>>>>>>> >>>>>>>> Good idea to have the api code in a separate branch. We can merge >> the >>>>>> work >>>>>>>> that we've done some time next week. >>>>>>>> >>>>>>>> KBestExtractor is one of the things we want to return via the API. >> We >>>>>>>> already have some of this implemented though as you suggest. I'll >> try >>>>>> and >>>>>>>> push the remaining work we've done into my github branch so you can >>>>>> compare. >>>>>>>> >>>>>>>> -Kellen >>>>>>>> >>>>>>>>> On Mon, Apr 25, 2016 at 6:11 AM, Matt Post <p...@cs.jhu.edu> >> wrote: >>>>>>>>> >>>>>>>>> Okay, after looking at this a bit more, I have a better >>>> understanding, >>>>>> and >>>>>>>>> an idea for how to move forward. >>>>>>>>> >>>>>>>>> First, I see that Translation.java has provisions for structured >>>>>> output. >>>>>>>>> I'm guessing StructuredTranslation was added by mistake? >>>>>>>>> >>>>>>>>> Moving forward, on the joshua_api branch, I was thinking of the >>>>>> following, >>>>>>>>> but want to make sure it doesn't collide with what you've done or >> are >>>>>> doing: >>>>>>>>> >>>>>>>>> - Factor KBestExtractor to return Translation objects instead of >>>>>> printing, >>>>>>>>> and also turn it into an iterator >>>>>>>>> >>>>>>>>> - There's a real discrepancy with competing forest representations. >>>>>> There >>>>>>>>> are operations on the hypergraph (via WalkerFunction), and then >> also >>>>>>>>> operations on Derivations. This leads to code that operates on >> both. >>>> It >>>>>>>>> would be nice if the KBestExtractor just returned something like a >>>>>> reduced >>>>>>>>> "slice" of a forest forest new nodes containing only single back >>>>>> pointers, >>>>>>>>> representing exactly the nth-best derivation. Then we could >>>>>> generically use >>>>>>>>> the WalkerFunctions on that (e.g., viterbi extraction), and get rid >>>> of >>>>>> many >>>>>>>>> of the DerivationVisitor classes >>>>>>>>> >>>>>>>>> - Related: constructing the k-best list is expensive, even for just >>>> the >>>>>>>>> first item, since you have to set up all the candidate lists and so >>>> on. >>>>>>>>> This led to me implementing top-n = 0, where you can get the >>>>>> translation >>>>>>>>> and some limited information (not replayed features) via Viterbi >>>>>> extractors >>>>>>>>> on the hypergraph, and you only have to call KBestExtractor if you >>>>>> actually >>>>>>>>> want k-best lists. This leads to dual code, e.g., substitutions of >>>>>>>>> output_format in multiple places. The first item the KBestIterator >>>>>> returns >>>>>>>>> should be constructed more efficiently, on the assumption that the >>>>>> caller >>>>>>>>> might not ask for more items. The StructuredTranslation object >>>> already >>>>>> is >>>>>>>>> lazy about returning things that are asked for (e.g., it will only >>>>>> replay >>>>>>>>> features if you ask for the feature functions). >>>>>>>>> >>>>>>>>> I will probably implement most of these tonight and tomorrow unless >>>>>> there >>>>>>>>> are objections from anyone (including an objection asking for more >>>>>> time to >>>>>>>>> evaluate!) >>>>>>>>> >>>>>>>>> matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Apr 23, 2016, at 7:22 PM, Matt Post <p...@cs.jhu.edu> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Kellen suggested we create a Joshua API, which I think is an >>>> excellent >>>>>>>>> idea. I've just made a start at this. It is not done and needs more >>>>>> work, >>>>>>>>> but I know that the Amazon folks have done some things on the >>>> backend, >>>>>> and >>>>>>>>> I wanted to make sure not to duplicate any work they might have >> done. >>>>>> Also, >>>>>>>>> it's something we should discuss. >>>>>>>>>> >>>>>>>>>> First, I was a bit confused about the joshua-6 subdirectory, and >> the >>>>>>>>> files there (also, what is tst/? Both of these were from a recent >>>>>> commit). >>>>>>>>> I moved those over and then things didn't compile. I got things >>>>>> compiling >>>>>>>>> and then made a few changes to StructuredTranslation. >>>>>>>>>> >>>>>>>>>> The biggest change I hope doesn't create problems is that I >>>> simplified >>>>>>>>> StructuredTranslation to no longer contain the Hypergraph object; >>>>>> instead, >>>>>>>>> it contains a DerivationState object. This represents a particular >>>>>> k-best >>>>>>>>> derivation, using Huang & Chiang (2005)-style ranked back pointers. >>>> The >>>>>>>>> nice thing is that you can simplify define a DerivationVisitor >> class >>>>>> and >>>>>>>>> pass it to DeriviationState::visit, and it will see every node in a >>>>>>>>> particular derivation. >>>>>>>>>> >>>>>>>>>> This is distinct from WalkerFunction, which walks an entire >>>>>> *HyperGraph*. >>>>>>>>>> >>>>>>>>>> Let me know what you guys thing about these changes, and maybe we >>>> can >>>>>>>>> spec out the API, and then clean things up inside a bit to use it >>>>>> (there's >>>>>>>>> no reason to be passing output stream writers to KBestExtractor, >> for >>>>>>>>> example...). >>>>>>>>>> >>>>>>>>>> matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Begin forwarded message: >>>>>>>>>>> >>>>>>>>>>> From: mjp...@apache.org >>>>>>>>>>> Subject: incubator-joshua git commit: Simplified >>>>>> StructuredTranslation >>>>>>>>> to use derivations instead of hypergraphs, now using in >>>> KBestExtractor >>>>>>>>>>> Date: April 23, 2016 at 7:12:19 PM EDT >>>>>>>>>>> To: comm...@joshua.incubator.apache.org >>>>>>>>>>> Reply-To: dev@joshua.incubator.apache.org >>>>>>>>>>> >>>>>>>>>>> Repository: incubator-joshua >>>>>>>>>>> Updated Branches: >>>>>>>>>>> refs/heads/joshua_api [created] 824319561 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Simplified StructuredTranslation to use derivations instead of >>>>>>>>> hypergraphs, now using in KBestExtractor >>>>>>>>>>> >>>>>>>>>>> The StructuredTranslation object is a great idea. I rewrote it >> here >>>>>> to >>>>>>>>> do the following: >>>>>>>>>>> >>>>>>>>>>> - It now compiles. I'm not sure why it was tucked under >>>>>>>>> $JOSHUA/joshua-6, but I just noticed this, and when I brought it >> in, >>>> it >>>>>>>>> didn't work >>>>>>>>>>> - I rewrote it to be based on a single (k-best) derivation, >>>> instead >>>>>> of >>>>>>>>> knowing about the whole hypergraph. We should also build a more >>>> general >>>>>>>>> object that knows about all the StructuredTranslation objects >> (maybe >>>>>> with >>>>>>>>> some renaming >>>>>>>>>>> - I changed it to have an option to only compute each of the >> items >>>>>>>>> (e.g., features) if it was requested. The non-lazy version remains >>>> the >>>>>>>>> default. >>>>>>>>>>> - KBestExtractor now uses these. This is the first step to >> making >>>> a >>>>>>>>> proper API. My thinking is that a large object (maybe Translation?) >>>>>> will >>>>>>>>> contain the k-best extractor and can return StructuredTranslation >>>>>> objects >>>>>>>>> as requested (again, we may want to jiggle the names a bit) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Project: >>>>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo >>>>>>>>>>> Commit: >>>>>>>>> >>>>>> >> http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/82431956 >>>>>>>>>>> Tree: >>>>>>>>> >>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/82431956 >>>>>>>>>>> Diff: >>>>>>>>> >>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/82431956 >>>>>>>>>>> >>>>>>>>>>> Branch: refs/heads/joshua_api >>>>>>>>>>> Commit: 8243195611a17e0ef067ec7dbf6c4a57612d041b >>>>>>>>>>> Parents: bc83a1a >>>>>>>>>>> Author: Matt Post <p...@cs.jhu.edu> >>>>>>>>>>> Authored: Sat Apr 23 19:12:12 2016 -0400 >>>>>>>>>>> Committer: Matt Post <p...@cs.jhu.edu> >>>>>>>>>>> Committed: Sat Apr 23 19:12:12 2016 -0400 >>>>>>>>>>> >>>>>>>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>>>>>>> src/joshua/decoder/StructuredTranslation.java | 144 >>>>>>>>> ++++++++++--------- >>>>>>>>>>> .../decoder/hypergraph/KBestExtractor.java | 47 +++--- >>>>>>>>>>> 2 files changed, 98 insertions(+), 93 deletions(-) >>>>>>>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>>>>> >>>>>> >>>> >> http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/82431956/src/joshua/decoder/StructuredTranslation.java >>>>>>>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>>>>>>> diff --git a/src/joshua/decoder/StructuredTranslation.java >>>>>>>>> b/src/joshua/decoder/StructuredTranslation.java >>>>>>>>>>> index 1939ea0..e3018b4 100644 >>>>>>>>>>> --- a/src/joshua/decoder/StructuredTranslation.java >>>>>>>>>>> +++ b/src/joshua/decoder/StructuredTranslation.java >>>>>>>>>>> @@ -10,7 +10,10 @@ import java.util.List; >>>>>>>>>>> import java.util.Map; >>>>>>>>>>> >>>>>>>>>>> import joshua.decoder.ff.FeatureFunction; >>>>>>>>>>> +import joshua.decoder.ff.FeatureVector; >>>>>>>>>>> import joshua.decoder.hypergraph.HyperGraph; >>>>>>>>>>> +import joshua.decoder.hypergraph.KBestExtractor.DerivationState; >>>>>>>>>>> +import joshua.decoder.io.DeNormalize; >>>>>>>>>>> import >>>> joshua.decoder.hypergraph.ViterbiFeatureVectorWalkerFunction; >>>>>>>>>>> import >> joshua.decoder.hypergraph.ViterbiOutputStringWalkerFunction; >>>>>>>>>>> import joshua.decoder.hypergraph.WalkerFunction; >>>>>>>>>>> @@ -30,77 +33,51 @@ import joshua.decoder.segment_file.Sentence; >>>>>>>>>>> public class StructuredTranslation { >>>>>>>>>>> >>>>>>>>>>> private final Sentence sourceSentence; >>>>>>>>>>> - private final List<FeatureFunction> featureFunctions; >>>>>>>>>>> + private final DerivationState derivationRoot; >>>>>>>>>>> + private final JoshuaConfiguration joshuaConfiguration; >>>>>>>>>>> >>>>>>>>>>> - private final String translationString; >>>>>>>>>>> - private final List<String> translationTokens; >>>>>>>>>>> - private final float translationScore; >>>>>>>>>>> - private List<List<Integer>> translationWordAlignments; >>>>>>>>>>> - private Map<String,Float> translationFeatures; >>>>>>>>>>> - private final float extractionTime; >>>>>>>>>>> + private String translationString = null; >>>>>>>>>>> + private List<String> translationTokens = null; >>>>>>>>>>> + private String translationWordAlignments = null; >>>>>>>>>>> + private FeatureVector translationFeatures = null; >>>>>>>>>>> + private float extractionTime = 0.0f; >>>>>>>>>>> + private float translationScore = 0.0f; >>>>>>>>>>> >>>>>>>>>>> + /* If we need to replay the features, this will get set to >> true, >>>>>> so >>>>>>>>> that it's only done once */ >>>>>>>>>>> + private boolean featuresReplayed = false; >>>>>>>>>>> + >>>>>>>>>>> public StructuredTranslation(final Sentence sourceSentence, >>>>>>>>>>> - final HyperGraph hypergraph, >>>>>>>>>>> - final List<FeatureFunction> featureFunctions) { >>>>>>>>>>> - >>>>>>>>>>> - final long startTime = System.currentTimeMillis(); >>>>>>>>>>> - >>>>>>>>>>> - this.sourceSentence = sourceSentence; >>>>>>>>>>> - this.featureFunctions = featureFunctions; >>>>>>>>>>> - this.translationString = extractViterbiString(hypergraph); >>>>>>>>>>> - this.translationTokens = extractTranslationTokens(); >>>>>>>>>>> - this.translationScore = >> extractTranslationScore(hypergraph); >>>>>>>>>>> - this.translationFeatures = >>>> extractViterbiFeatures(hypergraph); >>>>>>>>>>> - this.translationWordAlignments = >>>>>>>>> extractViterbiWordAlignment(hypergraph); >>>>>>>>>>> - this.extractionTime = (System.currentTimeMillis() - >>>>>> startTime) / >>>>>>>>> 1000.0f; >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> - private Map<String,Float> extractViterbiFeatures(final >>>> HyperGraph >>>>>>>>> hypergraph) { >>>>>>>>>>> - if (hypergraph == null) { >>>>>>>>>>> - return emptyMap(); >>>>>>>>>>> - } else { >>>>>>>>>>> - ViterbiFeatureVectorWalkerFunction >>>> viterbiFeatureVectorWalker >>>>>> = >>>>>>>>> new ViterbiFeatureVectorWalkerFunction(featureFunctions, >>>>>> sourceSentence); >>>>>>>>>>> - walk(hypergraph.goalNode, viterbiFeatureVectorWalker); >>>>>>>>>>> - return new >>>>>>>>> HashMap<String,Float>(viterbiFeatureVectorWalker.getFeaturesMap()); >>>>>>>>>>> - } >>>>>>>>>>> - } >>>>>>>>>>> + final DerivationState derivationRoot, >>>>>>>>>>> + JoshuaConfiguration config) { >>>>>>>>>>> >>>>>>>>>>> - private List<List<Integer>> extractViterbiWordAlignment(final >>>>>>>>> HyperGraph hypergraph) { >>>>>>>>>>> - if (hypergraph == null) { >>>>>>>>>>> - return emptyList(); >>>>>>>>>>> - } else { >>>>>>>>>>> - final WordAlignmentExtractor wordAlignmentWalker = new >>>>>>>>> WordAlignmentExtractor(); >>>>>>>>>>> - walk(hypergraph.goalNode, wordAlignmentWalker); >>>>>>>>>>> - return wordAlignmentWalker.getFinalWordAlignments(); >>>>>>>>>>> - } >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> - private float extractTranslationScore(final HyperGraph >>>>>> hypergraph) { >>>>>>>>>>> - if (hypergraph == null) { >>>>>>>>>>> - return 0; >>>>>>>>>>> - } else { >>>>>>>>>>> - return hypergraph.goalNode.getScore(); >>>>>>>>>>> - } >>>>>>>>>>> - } >>>>>>>>>>> - >>>>>>>>>>> - private String extractViterbiString(final HyperGraph >>>> hypergraph) { >>>>>>>>>>> - if (hypergraph == null) { >>>>>>>>>>> - return sourceSentence.source(); >>>>>>>>>>> - } else { >>>>>>>>>>> - final WalkerFunction viterbiOutputStringWalker = new >>>>>>>>> ViterbiOutputStringWalkerFunction(); >>>>>>>>>>> - walk(hypergraph.goalNode, viterbiOutputStringWalker); >>>>>>>>>>> - return viterbiOutputStringWalker.toString(); >>>>>>>>>>> - } >>>>>>>>>>> + this(sourceSentence, derivationRoot, config, true); >>>>>>>>>>> } >>>>>>>>>>> + >>>>>>>>>>> >>>>>>>>>>> - private List<String> extractTranslationTokens() { >>>>>>>>>>> - if (translationString.isEmpty()) { >>>>>>>>>>> - return emptyList(); >>>>>>>>>>> - } else { >>>>>>>>>>> - return asList(translationString.split("\\s+")); >>>>>>>>>>> + public StructuredTranslation(final Sentence sourceSentence, >>>>>>>>>>> + final DerivationState derivationRoot, >>>>>>>>>>> + JoshuaConfiguration config, >>>>>>>>>>> + boolean now) { >>>>>>>>>>> + >>>>>>>>>>> + final long startTime = System.currentTimeMillis(); >>>>>>>>>>> + >>>>>>>>>>> + this.sourceSentence = sourceSentence; >>>>>>>>>>> + this.derivationRoot = derivationRoot; >>>>>>>>>>> + this.joshuaConfiguration = config; >>>>>>>>>>> + >>>>>>>>>>> + if (now) { >>>>>>>>>>> + getTranslationString(); >>>>>>>>>>> + getTranslationTokens(); >>>>>>>>>>> + getTranslationScore(); >>>>>>>>>>> + getTranslationFeatures(); >>>>>>>>>>> + getTranslationWordAlignments(); >>>>>>>>>>> } >>>>>>>>>>> + this.translationScore = getTranslationScore(); >>>>>>>>>>> + >>>>>>>>>>> + this.extractionTime = (System.currentTimeMillis() - >>>> startTime) / >>>>>>>>> 1000.0f; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> + >>>>>>>>>>> // Getters to use upstream >>>>>>>>>>> >>>>>>>>>>> public Sentence getSourceSentence() { >>>>>>>>>>> @@ -112,25 +89,60 @@ public class StructuredTranslation { >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> public String getTranslationString() { >>>>>>>>>>> - return translationString; >>>>>>>>>>> + if (this.translationString == null) { >>>>>>>>>>> + if (derivationRoot == null) { >>>>>>>>>>> + this.translationString = sourceSentence.source(); >>>>>>>>>>> + } else { >>>>>>>>>>> + this.translationString = derivationRoot.getHypothesis(); >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + return this.translationString; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> public List<String> getTranslationTokens() { >>>>>>>>>>> + if (this.translationTokens == null) { >>>>>>>>>>> + String trans = getTranslationString(); >>>>>>>>>>> + if (trans.isEmpty()) { >>>>>>>>>>> + this.translationTokens = emptyList(); >>>>>>>>>>> + } else { >>>>>>>>>>> + this.translationTokens = asList(trans.split("\\s+")); >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> return translationTokens; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> public float getTranslationScore() { >>>>>>>>>>> + if (derivationRoot == null) { >>>>>>>>>>> + this.translationScore = 0.0f; >>>>>>>>>>> + } else { >>>>>>>>>>> + this.translationScore = derivationRoot.getModelCost(); >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> return translationScore; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> /** >>>>>>>>>>> * Returns a list of target to source alignments. >>>>>>>>>>> */ >>>>>>>>>>> - public List<List<Integer>> getTranslationWordAlignments() { >>>>>>>>>>> - return translationWordAlignments; >>>>>>>>>>> + public String getTranslationWordAlignments() { >>>>>>>>>>> + if (this.translationWordAlignments == null) { >>>>>>>>>>> + if (derivationRoot == null) >>>>>>>>>>> + this.translationWordAlignments = ""; >>>>>>>>>>> + else { >>>>>>>>>>> + WordAlignmentExtractor wordAlignmentExtractor = new >>>>>>>>> WordAlignmentExtractor(); >>>>>>>>>>> + derivationRoot.visit(wordAlignmentExtractor); >>>>>>>>>>> + this.translationWordAlignments = >>>>>>>>> wordAlignmentExtractor.toString(); >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + return this.translationWordAlignments; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - public Map<String,Float> getTranslationFeatures() { >>>>>>>>>>> + public FeatureVector getTranslationFeatures() { >>>>>>>>>>> + if (this.translationFeatures == null) >>>>>>>>>>> + this.translationFeatures = >> derivationRoot.replayFeatures(); >>>>>>>>>>> + >>>>>>>>>>> return translationFeatures; >>>>>>>>>>> } >>>>>>>>> >>>>>> >>>> >> http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/82431956/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>>>>>>> diff --git a/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>>>> b/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>>>>>> index 42539cc..ea6ca73 100644 >>>>>>>>>>> --- a/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>>>>>> +++ b/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>>>>>> @@ -34,6 +34,7 @@ import java.util.regex.Matcher; >>>>>>>>>>> import joshua.corpus.Vocabulary; >>>>>>>>>>> import joshua.decoder.BLEU; >>>>>>>>>>> import joshua.decoder.JoshuaConfiguration; >>>>>>>>>>> +import joshua.decoder.StructuredTranslation; >>>>>>>>>>> import joshua.decoder.chart_parser.ComputeNodeResult; >>>>>>>>>>> import joshua.decoder.ff.FeatureFunction; >>>>>>>>>>> import joshua.decoder.ff.FeatureVector; >>>>>>>>>>> @@ -167,33 +168,25 @@ public class KBestExtractor { >>>>>>>>>>> // Determine the k-best hypotheses at each HGNode >>>>>>>>>>> VirtualNode virtualNode = getVirtualNode(node); >>>>>>>>>>> DerivationState derivationState = >>>>>>>>> virtualNode.lazyKBestExtractOnNode(this, k); >>>>>>>>>>> + >>>>>>>>>>> // DerivationState derivationState = getKthDerivation(node, >> k); >>>>>>>>>>> if (derivationState != null) { >>>>>>>>>>> - // ==== read the kbest from each hgnode and convert to >>>> output >>>>>>>>> format >>>>>>>>>>> - FeatureVector features = new FeatureVector(); >>>>>>>>>>> >>>>>>>>>>> - /* >>>>>>>>>>> - * To save space, the decoder only stores the model cost, >> no >>>>>> the >>>>>>>>> individual feature values. If >>>>>>>>>>> - * you want to output them, you have to replay them. >>>>>>>>>>> - */ >>>>>>>>>>> - String hypothesis = null; >>>>>>>>>>> - if (joshuaConfiguration.outputFormat.contains("%f") >>>>>>>>>>> - || joshuaConfiguration.outputFormat.contains("%d")) >>>>>>>>>>> - features = derivationState.replayFeatures(); >>>>>>>>>>> - >>>>>>>>>>> - hypothesis = derivationState.getHypothesis() >>>>>>>>>>> + StructuredTranslation translation = new >>>> StructuredTranslation( >>>>>>>>>>> + sentence, derivationState, joshuaConfiguration); >>>>>>>>>>> + >>>>>>>>>>> + String hypothesis = translation.getTranslationString() >>>>>>>>>>> .replaceAll("-lsb-", "[") >>>>>>>>>>> .replaceAll("-rsb-", "]") >>>>>>>>>>> .replaceAll("-pipe-", "|"); >>>>>>>>>>> >>>>>>>>>>> - >>>>>>>>>>> outputString = joshuaConfiguration.outputFormat >>>>>>>>>>> .replace("%k", Integer.toString(k)) >>>>>>>>>>> .replace("%s", hypothesis) >>>>>>>>>>> .replace("%S", DeNormalize.processSingleLine(hypothesis)) >>>>>>>>>>> .replace("%i", Integer.toString(sentence.id())) >>>>>>>>>>> - .replace("%f", joshuaConfiguration.moses ? >>>>>>>>> features.mosesString() : features.toString()) >>>>>>>>>>> - .replace("%c", String.format("%.3f", >>>>>> derivationState.cost)); >>>>>>>>>>> + .replace("%f", joshuaConfiguration.moses ? >>>>>>>>> translation.getTranslationFeatures().mosesString() : >>>>>>>>> translation.getTranslationFeatures().toString()) >>>>>>>>>>> + .replace("%c", String.format("%.3f", >>>>>>>>> translation.getTranslationScore())); >>>>>>>>>>> >>>>>>>>>>> if (joshuaConfiguration.outputFormat.contains("%t")) { >>>>>>>>>>> outputString = outputString.replace("%t", >>>>>>>>> derivationState.getTree()); >>>>>>>>>>> @@ -250,11 +243,11 @@ public class KBestExtractor { >>>>>>>>>>> return; >>>>>>>>>>> >>>>>>>>>>> for (int k = 1; k <= topN; k++) { >>>>>>>>>>> - String hypStr = getKthHyp(hg.goalNode, k); >>>>>>>>>>> - if (null == hypStr) >>>>>>>>>>> + String translation = getKthHyp(hg.goalNode, k); >>>>>>>>>>> + if (null == translation) >>>>>>>>>>> break; >>>>>>>>>>> >>>>>>>>>>> - out.write(hypStr); >>>>>>>>>>> + out.write(translation); >>>>>>>>>>> out.write("\n"); >>>>>>>>>>> out.flush(); >>>>>>>>>>> } >>>>>>>>>>> @@ -704,11 +697,11 @@ public class KBestExtractor { >>>>>>>>>>> /** >>>>>>>>>>> * Visits every state in the derivation in a depth-first order. >>>>>>>>>>> */ >>>>>>>>>>> - private DerivationVisitor visit(DerivationVisitor visitor) { >>>>>>>>>>> + public DerivationVisitor visit(DerivationVisitor visitor) { >>>>>>>>>>> return visit(visitor, 0); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - private DerivationVisitor visit(DerivationVisitor visitor, >> int >>>>>>>>> indent) { >>>>>>>>>>> + public DerivationVisitor visit(DerivationVisitor visitor, >> int >>>>>>>>> indent) { >>>>>>>>>>> >>>>>>>>>>> visitor.before(this, indent); >>>>>>>>>>> >>>>>>>>>>> @@ -733,25 +726,25 @@ public class KBestExtractor { >>>>>>>>>>> return visitor; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - private String getHypothesis() { >>>>>>>>>>> + public String getHypothesis() { >>>>>>>>>>> return getHypothesis(defaultSide); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - private String getTree() { >>>>>>>>>>> + public String getTree() { >>>>>>>>>>> return visit(new TreeExtractor()).toString(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - private String getHypothesis(Side side) { >>>>>>>>>>> + public String getHypothesis(Side side) { >>>>>>>>>>> return visit(new HypothesisExtractor(side)).toString(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - private FeatureVector replayFeatures() { >>>>>>>>>>> + public FeatureVector replayFeatures() { >>>>>>>>>>> FeatureReplayer fp = new FeatureReplayer(); >>>>>>>>>>> visit(fp); >>>>>>>>>>> return fp.getFeatures(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> - private String getDerivation() { >>>>>>>>>>> + public String getDerivation() { >>>>>>>>>>> return visit(new DerivationExtractor()).toString(); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> @@ -811,7 +804,7 @@ public class KBestExtractor { >>>>>>>>>>> */ >>>>>>>>>>> void after(DerivationState state, int level); >>>>>>>>>>> } >>>>>>>>>>> - >>>>>>>>>>> + >>>>>>>>>>> /** >>>>>>>>>>> * Extracts the hypothesis from the leaves of the tree using the >>>>>>>>> generic (depth-first) visitor. >>>>>>>>>>> * Since we're using the visitor, we can't just print out the >> words >>>> as >>>>>>>>> we see them. We have to >>>>>>>>>>> @@ -878,7 +871,7 @@ public class KBestExtractor { >>>>>>>>>>> return outputs.pop().replaceAll("<s> ", "").replace(" </s>", >> ""); >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> - >>>>>>>>>>> + >>>>>>>>>>> /** >>>>>>>>>>> * Assembles a Penn treebank format tree for a given derivation. >>>>>>>>>>> */ >>>>>>>>> >>>>>>>>> >>>>>>> Amazon Development Center Germany GmbH >>>>>>> Berlin - Dresden - Aachen >>>>>>> main office: Krausenstr. 38, 10117 Berlin >>>>>>> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger >>>>>>> Ust-ID: DE289237879 >>>>>>> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B >>>>>>> >>>>>> >>>>>> >>>> >>>> >> >>