Re: joshua_api

Matt Post Wed, 27 Apr 2016 17:45:06 -0700

Sure thing, hope to tonight.

matt



> On Apr 27, 2016, at 6:41 PM, kellen sunderland <kellen.sunderl...@gmail.com> 
> wrote:
> 
> Hey Matt,
> 
> If you had time that would be fantastic.  I've created a new PR in case you
> want to pull it in.  There's actually 4 tests failing for me currently
> (casing issues causing at least one).  If you want to wait until we fix
> these tests that's also completely fine.
> 
> -Kellen
> 
> On Wed, Apr 27, 2016 at 11:32 AM, Matt Post <p...@cs.jhu.edu> wrote:
> 
>> Do you want me to fix the recapitalization? Or are you going to do that? I
>> looked a bit, and it seems I'll have to add a method to get a word
>> alignment object instead of just the string, so that I can poke through
>> them. This approach is as good as true-casing in some languages.
>> 
>> A few other things:
>> 
>> - I saw a comment in the commit about the changes not working for
>> phrase-based translation. Can you (or Felix) elaborate? What exactly will
>> no longer work?
>> 
>> - Currently, there are multiple places where the "output-format" string
>> has to get edited (KBestExtractor and in Translation). After you push your
>> changes in, I'm going to make some edits so that this all occurs in one
>> place.
>> 
>> matt
>> 
>> 
>>> On Apr 27, 2016, at 2:25 PM, kellen sunderland <
>> kellen.sunderl...@gmail.com> wrote:
>>> 
>>> Thanks for taking a look Matt,
>>> 
>>> I think this is all we've got planned as far as changes relating to an
>> API
>>> would go.  We have a few more commits coming but they're just performance
>>> improvements and they don't change too much in the way of interfaces or
>>> method signatures.
>>> 
>>> -Kellen
>>> 
>>> On Wed, Apr 27, 2016 at 4:47 AM, Matt Post <p...@cs.jhu.edu> wrote:
>>> 
>>>> Kellen,
>>>> 
>>>> Great. I had a chance to start looking over the ReworkedExtractions
>>>> branch. I'll have some more time today. It looks good to me so far. Is
>>>> there anything else you plan to do, or does that branch contain
>> basically
>>>> all of it (apart from the recapitalization fix, which I see should be
>>>> applied more selectively, maybe only when a -recapitalize flag is
>> present,
>>>> to save on time).
>>>> 
>>>> matt
>>>> 
>>>> 
>>>>> On Apr 26, 2016, at 1:56 AM, kellen sunderland <
>>>> kellen.sunderl...@gmail.com> wrote:
>>>>> 
>>>>> Hey Matt,
>>>>> 
>>>>> I've opened a new pull request with a few of our commits, feel free to
>>>> take
>>>>> a look when you have some time.
>>>>> 
>>>>> More importantly I've pushed our queue of upcoming commits to the
>>>> following
>>>>> branch in my fork:
>>>>> 
>>>> 
>> https://github.com/KellenSunderland/incubator-joshua/commits/ReworkedExtractions
>>>>> .  From there you can get an idea for the work we've done so far.  I
>>>>> haven't opened a PR yet for these commits because there's still some
>>>>> merging I have to do (there's a few failing tests and I had to
>>>> temporarily
>>>>> comment out some of your casing code).  Once that's fixed I'll do a
>>>> proper
>>>>> PR for these commits.
>>>>> 
>>>>> -Kellen
>>>>> 
>>>>> On Mon, Apr 25, 2016 at 1:35 PM, Matt Post <p...@cs.jhu.edu> wrote:
>>>>> 
>>>>>> Great. On that first point, I meant that translate() would return a
>>>>>> Translation object, which would know its hypergraph and could iterate
>>>> over
>>>>>> a KBestExtractor. In any case, though, it sounds like you are a bit
>>>> ahead
>>>>>> of me on this, so I'll wait for a push that I can see, and then we can
>>>>>> converge on the design.
>>>>>> 
>>>>>> matt
>>>>>> 
>>>>>> 
>>>>>>> On Apr 25, 2016, at 4:10 PM, Hieber, Felix <fhie...@amazon.de>
>> wrote:
>>>>>>> 
>>>>>>> Hi Matt,
>>>>>>> 
>>>>>>> These are some nice suggestions. Most of the work we have done is in
>>>>>> line of what you propose so I would agree with Kellen that we should
>>>>>> synchronize and compare better earlier than later.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Felix
>>>>>>> 
>>>>>>>> On 25.04.2016, at 07:44, kellen sunderland <
>>>> kellen.sunderl...@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hey Matt,
>>>>>>>> 
>>>>>>>> Sorry for the late reply.  The Joshua-6 folder and tst may have just
>>>>>> been
>>>>>>>> artifacts of some symlinks I have locally.  Sorry they may have been
>>>>>> pushed
>>>>>>>> by mistake, I can clean that up.
>>>>>>>> 
>>>>>>>> Good idea to have the api code in a separate branch.  We can merge
>> the
>>>>>> work
>>>>>>>> that we've done some time next week.
>>>>>>>> 
>>>>>>>> KBestExtractor is one of the things we want to return via the API.
>> We
>>>>>>>> already have some of this implemented though as you suggest.  I'll
>> try
>>>>>> and
>>>>>>>> push the remaining work we've done into my github branch so you can
>>>>>> compare.
>>>>>>>> 
>>>>>>>> -Kellen
>>>>>>>> 
>>>>>>>>> On Mon, Apr 25, 2016 at 6:11 AM, Matt Post <p...@cs.jhu.edu>
>> wrote:
>>>>>>>>> 
>>>>>>>>> Okay, after looking at this a bit more, I have a better
>>>> understanding,
>>>>>> and
>>>>>>>>> an idea for how to move forward.
>>>>>>>>> 
>>>>>>>>> First, I see that Translation.java has provisions for structured
>>>>>> output.
>>>>>>>>> I'm guessing StructuredTranslation was added by mistake?
>>>>>>>>> 
>>>>>>>>> Moving forward, on the joshua_api branch, I was thinking of the
>>>>>> following,
>>>>>>>>> but want to make sure it doesn't collide with what you've done or
>> are
>>>>>> doing:
>>>>>>>>> 
>>>>>>>>> - Factor KBestExtractor to return Translation objects instead of
>>>>>> printing,
>>>>>>>>> and also turn it into an iterator
>>>>>>>>> 
>>>>>>>>> - There's a real discrepancy with competing forest representations.
>>>>>> There
>>>>>>>>> are operations on the hypergraph (via WalkerFunction), and then
>> also
>>>>>>>>> operations on Derivations. This leads to code that operates on
>> both.
>>>> It
>>>>>>>>> would be nice if the KBestExtractor just returned something like a
>>>>>> reduced
>>>>>>>>> "slice" of a forest forest new nodes containing only single back
>>>>>> pointers,
>>>>>>>>> representing exactly the nth-best derivation. Then we could
>>>>>> generically use
>>>>>>>>> the WalkerFunctions on that (e.g., viterbi extraction), and get rid
>>>> of
>>>>>> many
>>>>>>>>> of the DerivationVisitor classes
>>>>>>>>> 
>>>>>>>>> - Related: constructing the k-best list is expensive, even for just
>>>> the
>>>>>>>>> first item, since you have to set up all the candidate lists and so
>>>> on.
>>>>>>>>> This led to me implementing top-n = 0, where you can get the
>>>>>> translation
>>>>>>>>> and some limited information (not replayed features) via Viterbi
>>>>>> extractors
>>>>>>>>> on the hypergraph, and you only have to call KBestExtractor if you
>>>>>> actually
>>>>>>>>> want k-best lists. This leads to dual code, e.g., substitutions of
>>>>>>>>> output_format in multiple places. The first item the KBestIterator
>>>>>> returns
>>>>>>>>> should be constructed more efficiently, on the assumption that the
>>>>>> caller
>>>>>>>>> might not ask for more items. The StructuredTranslation object
>>>> already
>>>>>> is
>>>>>>>>> lazy about returning things that are asked for (e.g., it will only
>>>>>> replay
>>>>>>>>> features if you ask for the feature functions).
>>>>>>>>> 
>>>>>>>>> I will probably implement most of these tonight and tomorrow unless
>>>>>> there
>>>>>>>>> are objections from anyone (including an objection asking for more
>>>>>> time to
>>>>>>>>> evaluate!)
>>>>>>>>> 
>>>>>>>>> matt
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Apr 23, 2016, at 7:22 PM, Matt Post <p...@cs.jhu.edu> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> Kellen suggested we create a Joshua API, which I think is an
>>>> excellent
>>>>>>>>> idea. I've just made a start at this. It is not done and needs more
>>>>>> work,
>>>>>>>>> but I know that the Amazon folks have done some things on the
>>>> backend,
>>>>>> and
>>>>>>>>> I wanted to make sure not to duplicate any work they might have
>> done.
>>>>>> Also,
>>>>>>>>> it's something we should discuss.
>>>>>>>>>> 
>>>>>>>>>> First, I was a bit confused about the joshua-6 subdirectory, and
>> the
>>>>>>>>> files there (also, what is tst/? Both of these were from a recent
>>>>>> commit).
>>>>>>>>> I moved those over and then things didn't compile. I got things
>>>>>> compiling
>>>>>>>>> and then made a few changes to StructuredTranslation.
>>>>>>>>>> 
>>>>>>>>>> The biggest change I hope doesn't create problems is that I
>>>> simplified
>>>>>>>>> StructuredTranslation to no longer contain the Hypergraph object;
>>>>>> instead,
>>>>>>>>> it contains a DerivationState object. This represents a particular
>>>>>> k-best
>>>>>>>>> derivation, using Huang & Chiang (2005)-style ranked back pointers.
>>>> The
>>>>>>>>> nice thing is that you can simplify define a DerivationVisitor
>> class
>>>>>> and
>>>>>>>>> pass it to DeriviationState::visit, and it will see every node in a
>>>>>>>>> particular derivation.
>>>>>>>>>> 
>>>>>>>>>> This is distinct from WalkerFunction, which walks an entire
>>>>>> *HyperGraph*.
>>>>>>>>>> 
>>>>>>>>>> Let me know what you guys thing about these changes, and maybe we
>>>> can
>>>>>>>>> spec out the API, and then clean things up inside a bit to use it
>>>>>> (there's
>>>>>>>>> no reason to be passing output stream writers to KBestExtractor,
>> for
>>>>>>>>> example...).
>>>>>>>>>> 
>>>>>>>>>> matt
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>> 
>>>>>>>>>>> From: mjp...@apache.org
>>>>>>>>>>> Subject: incubator-joshua git commit: Simplified
>>>>>> StructuredTranslation
>>>>>>>>> to use derivations instead of hypergraphs, now using in
>>>> KBestExtractor
>>>>>>>>>>> Date: April 23, 2016 at 7:12:19 PM EDT
>>>>>>>>>>> To: comm...@joshua.incubator.apache.org
>>>>>>>>>>> Reply-To: dev@joshua.incubator.apache.org
>>>>>>>>>>> 
>>>>>>>>>>> Repository: incubator-joshua
>>>>>>>>>>> Updated Branches:
>>>>>>>>>>> refs/heads/joshua_api [created] 824319561
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Simplified StructuredTranslation to use derivations instead of
>>>>>>>>> hypergraphs, now using in KBestExtractor
>>>>>>>>>>> 
>>>>>>>>>>> The StructuredTranslation object is a great idea. I rewrote it
>> here
>>>>>> to
>>>>>>>>> do the following:
>>>>>>>>>>> 
>>>>>>>>>>> - It now compiles. I'm not sure why it was tucked under
>>>>>>>>> $JOSHUA/joshua-6, but I just noticed this, and when I brought it
>> in,
>>>> it
>>>>>>>>> didn't work
>>>>>>>>>>> -  I rewrote it to be based on a single (k-best) derivation,
>>>> instead
>>>>>> of
>>>>>>>>> knowing about the whole hypergraph. We should also build a more
>>>> general
>>>>>>>>> object that knows about all the StructuredTranslation objects
>> (maybe
>>>>>> with
>>>>>>>>> some renaming
>>>>>>>>>>> -  I changed it to have an option to only compute each of the
>> items
>>>>>>>>> (e.g., features) if it was requested. The non-lazy version remains
>>>> the
>>>>>>>>> default.
>>>>>>>>>>> -  KBestExtractor now uses these. This is the first step to
>> making
>>>> a
>>>>>>>>> proper API. My thinking is that a large object (maybe Translation?)
>>>>>> will
>>>>>>>>> contain the k-best extractor and can return StructuredTranslation
>>>>>> objects
>>>>>>>>> as requested (again, we may want to jiggle the names a bit)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Project:
>>>>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
>>>>>>>>>>> Commit:
>>>>>>>>> 
>>>>>> 
>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/82431956
>>>>>>>>>>> Tree:
>>>>>>>>> 
>>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/82431956
>>>>>>>>>>> Diff:
>>>>>>>>> 
>>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/82431956
>>>>>>>>>>> 
>>>>>>>>>>> Branch: refs/heads/joshua_api
>>>>>>>>>>> Commit: 8243195611a17e0ef067ec7dbf6c4a57612d041b
>>>>>>>>>>> Parents: bc83a1a
>>>>>>>>>>> Author: Matt Post <p...@cs.jhu.edu>
>>>>>>>>>>> Authored: Sat Apr 23 19:12:12 2016 -0400
>>>>>>>>>>> Committer: Matt Post <p...@cs.jhu.edu>
>>>>>>>>>>> Committed: Sat Apr 23 19:12:12 2016 -0400
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>> ----------------------------------------------------------------------
>>>>>>>>>>> src/joshua/decoder/StructuredTranslation.java   | 144
>>>>>>>>> ++++++++++---------
>>>>>>>>>>> .../decoder/hypergraph/KBestExtractor.java      |  47 +++---
>>>>>>>>>>> 2 files changed, 98 insertions(+), 93 deletions(-)
>>>>>>>>>>> 
>>>>>> ----------------------------------------------------------------------
>>>>>>>>> 
>>>>>> 
>>>> 
>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/82431956/src/joshua/decoder/StructuredTranslation.java
>>>>>>>>>>> 
>>>>>> ----------------------------------------------------------------------
>>>>>>>>>>> diff --git a/src/joshua/decoder/StructuredTranslation.java
>>>>>>>>> b/src/joshua/decoder/StructuredTranslation.java
>>>>>>>>>>> index 1939ea0..e3018b4 100644
>>>>>>>>>>> --- a/src/joshua/decoder/StructuredTranslation.java
>>>>>>>>>>> +++ b/src/joshua/decoder/StructuredTranslation.java
>>>>>>>>>>> @@ -10,7 +10,10 @@ import java.util.List;
>>>>>>>>>>> import java.util.Map;
>>>>>>>>>>> 
>>>>>>>>>>> import joshua.decoder.ff.FeatureFunction;
>>>>>>>>>>> +import joshua.decoder.ff.FeatureVector;
>>>>>>>>>>> import joshua.decoder.hypergraph.HyperGraph;
>>>>>>>>>>> +import joshua.decoder.hypergraph.KBestExtractor.DerivationState;
>>>>>>>>>>> +import joshua.decoder.io.DeNormalize;
>>>>>>>>>>> import
>>>> joshua.decoder.hypergraph.ViterbiFeatureVectorWalkerFunction;
>>>>>>>>>>> import
>> joshua.decoder.hypergraph.ViterbiOutputStringWalkerFunction;
>>>>>>>>>>> import joshua.decoder.hypergraph.WalkerFunction;
>>>>>>>>>>> @@ -30,77 +33,51 @@ import joshua.decoder.segment_file.Sentence;
>>>>>>>>>>> public class StructuredTranslation {
>>>>>>>>>>> 
>>>>>>>>>>> private final Sentence sourceSentence;
>>>>>>>>>>> -  private final List<FeatureFunction> featureFunctions;
>>>>>>>>>>> +  private final DerivationState derivationRoot;
>>>>>>>>>>> +  private final JoshuaConfiguration joshuaConfiguration;
>>>>>>>>>>> 
>>>>>>>>>>> -  private final String translationString;
>>>>>>>>>>> -  private final List<String> translationTokens;
>>>>>>>>>>> -  private final float translationScore;
>>>>>>>>>>> -  private List<List<Integer>> translationWordAlignments;
>>>>>>>>>>> -  private Map<String,Float> translationFeatures;
>>>>>>>>>>> -  private final float extractionTime;
>>>>>>>>>>> +  private String translationString = null;
>>>>>>>>>>> +  private List<String> translationTokens = null;
>>>>>>>>>>> +  private String translationWordAlignments = null;
>>>>>>>>>>> +  private FeatureVector translationFeatures = null;
>>>>>>>>>>> +  private float extractionTime = 0.0f;
>>>>>>>>>>> +  private float translationScore = 0.0f;
>>>>>>>>>>> 
>>>>>>>>>>> +  /* If we need to replay the features, this will get set to
>> true,
>>>>>> so
>>>>>>>>> that it's only done once */
>>>>>>>>>>> +  private boolean featuresReplayed = false;
>>>>>>>>>>> +
>>>>>>>>>>> public StructuredTranslation(final Sentence sourceSentence,
>>>>>>>>>>> -      final HyperGraph hypergraph,
>>>>>>>>>>> -      final List<FeatureFunction> featureFunctions) {
>>>>>>>>>>> -
>>>>>>>>>>> -      final long startTime = System.currentTimeMillis();
>>>>>>>>>>> -
>>>>>>>>>>> -      this.sourceSentence = sourceSentence;
>>>>>>>>>>> -      this.featureFunctions = featureFunctions;
>>>>>>>>>>> -      this.translationString = extractViterbiString(hypergraph);
>>>>>>>>>>> -      this.translationTokens = extractTranslationTokens();
>>>>>>>>>>> -      this.translationScore =
>> extractTranslationScore(hypergraph);
>>>>>>>>>>> -      this.translationFeatures =
>>>> extractViterbiFeatures(hypergraph);
>>>>>>>>>>> -      this.translationWordAlignments =
>>>>>>>>> extractViterbiWordAlignment(hypergraph);
>>>>>>>>>>> -      this.extractionTime = (System.currentTimeMillis() -
>>>>>> startTime) /
>>>>>>>>> 1000.0f;
>>>>>>>>>>> -  }
>>>>>>>>>>> -
>>>>>>>>>>> -  private Map<String,Float> extractViterbiFeatures(final
>>>> HyperGraph
>>>>>>>>> hypergraph) {
>>>>>>>>>>> -    if (hypergraph == null) {
>>>>>>>>>>> -      return emptyMap();
>>>>>>>>>>> -    } else {
>>>>>>>>>>> -      ViterbiFeatureVectorWalkerFunction
>>>> viterbiFeatureVectorWalker
>>>>>> =
>>>>>>>>> new ViterbiFeatureVectorWalkerFunction(featureFunctions,
>>>>>> sourceSentence);
>>>>>>>>>>> -      walk(hypergraph.goalNode, viterbiFeatureVectorWalker);
>>>>>>>>>>> -      return new
>>>>>>>>> HashMap<String,Float>(viterbiFeatureVectorWalker.getFeaturesMap());
>>>>>>>>>>> -    }
>>>>>>>>>>> -  }
>>>>>>>>>>> +      final DerivationState derivationRoot,
>>>>>>>>>>> +      JoshuaConfiguration config) {
>>>>>>>>>>> 
>>>>>>>>>>> -  private List<List<Integer>> extractViterbiWordAlignment(final
>>>>>>>>> HyperGraph hypergraph) {
>>>>>>>>>>> -    if (hypergraph == null) {
>>>>>>>>>>> -      return emptyList();
>>>>>>>>>>> -    } else {
>>>>>>>>>>> -      final WordAlignmentExtractor wordAlignmentWalker = new
>>>>>>>>> WordAlignmentExtractor();
>>>>>>>>>>> -      walk(hypergraph.goalNode, wordAlignmentWalker);
>>>>>>>>>>> -      return wordAlignmentWalker.getFinalWordAlignments();
>>>>>>>>>>> -    }
>>>>>>>>>>> -  }
>>>>>>>>>>> -
>>>>>>>>>>> -  private float extractTranslationScore(final HyperGraph
>>>>>> hypergraph) {
>>>>>>>>>>> -    if (hypergraph == null) {
>>>>>>>>>>> -      return 0;
>>>>>>>>>>> -    } else {
>>>>>>>>>>> -      return hypergraph.goalNode.getScore();
>>>>>>>>>>> -    }
>>>>>>>>>>> -  }
>>>>>>>>>>> -
>>>>>>>>>>> -  private String extractViterbiString(final HyperGraph
>>>> hypergraph) {
>>>>>>>>>>> -    if (hypergraph == null) {
>>>>>>>>>>> -      return sourceSentence.source();
>>>>>>>>>>> -    } else {
>>>>>>>>>>> -      final WalkerFunction viterbiOutputStringWalker = new
>>>>>>>>> ViterbiOutputStringWalkerFunction();
>>>>>>>>>>> -      walk(hypergraph.goalNode, viterbiOutputStringWalker);
>>>>>>>>>>> -      return viterbiOutputStringWalker.toString();
>>>>>>>>>>> -    }
>>>>>>>>>>> +    this(sourceSentence, derivationRoot, config, true);
>>>>>>>>>>> }
>>>>>>>>>>> +
>>>>>>>>>>> 
>>>>>>>>>>> -  private List<String> extractTranslationTokens() {
>>>>>>>>>>> -    if (translationString.isEmpty()) {
>>>>>>>>>>> -      return emptyList();
>>>>>>>>>>> -    } else {
>>>>>>>>>>> -      return asList(translationString.split("\\s+"));
>>>>>>>>>>> +  public StructuredTranslation(final Sentence sourceSentence,
>>>>>>>>>>> +      final DerivationState derivationRoot,
>>>>>>>>>>> +      JoshuaConfiguration config,
>>>>>>>>>>> +      boolean now) {
>>>>>>>>>>> +
>>>>>>>>>>> +    final long startTime = System.currentTimeMillis();
>>>>>>>>>>> +
>>>>>>>>>>> +    this.sourceSentence = sourceSentence;
>>>>>>>>>>> +    this.derivationRoot = derivationRoot;
>>>>>>>>>>> +    this.joshuaConfiguration = config;
>>>>>>>>>>> +
>>>>>>>>>>> +    if (now) {
>>>>>>>>>>> +      getTranslationString();
>>>>>>>>>>> +      getTranslationTokens();
>>>>>>>>>>> +      getTranslationScore();
>>>>>>>>>>> +      getTranslationFeatures();
>>>>>>>>>>> +      getTranslationWordAlignments();
>>>>>>>>>>> }
>>>>>>>>>>> +    this.translationScore = getTranslationScore();
>>>>>>>>>>> +
>>>>>>>>>>> +    this.extractionTime = (System.currentTimeMillis() -
>>>> startTime) /
>>>>>>>>> 1000.0f;
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> +
>>>>>>>>>>> // Getters to use upstream
>>>>>>>>>>> 
>>>>>>>>>>> public Sentence getSourceSentence() {
>>>>>>>>>>> @@ -112,25 +89,60 @@ public class StructuredTranslation {
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> public String getTranslationString() {
>>>>>>>>>>> -    return translationString;
>>>>>>>>>>> +    if (this.translationString == null) {
>>>>>>>>>>> +      if (derivationRoot == null) {
>>>>>>>>>>> +        this.translationString = sourceSentence.source();
>>>>>>>>>>> +      } else {
>>>>>>>>>>> +        this.translationString = derivationRoot.getHypothesis();
>>>>>>>>>>> +      }
>>>>>>>>>>> +    }
>>>>>>>>>>> +    return this.translationString;
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> public List<String> getTranslationTokens() {
>>>>>>>>>>> +    if (this.translationTokens == null) {
>>>>>>>>>>> +      String trans = getTranslationString();
>>>>>>>>>>> +      if (trans.isEmpty()) {
>>>>>>>>>>> +        this.translationTokens = emptyList();
>>>>>>>>>>> +      } else {
>>>>>>>>>>> +        this.translationTokens = asList(trans.split("\\s+"));
>>>>>>>>>>> +      }
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> return translationTokens;
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> public float getTranslationScore() {
>>>>>>>>>>> +    if (derivationRoot == null) {
>>>>>>>>>>> +      this.translationScore = 0.0f;
>>>>>>>>>>> +    } else {
>>>>>>>>>>> +      this.translationScore = derivationRoot.getModelCost();
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> return translationScore;
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> /**
>>>>>>>>>>> * Returns a list of target to source alignments.
>>>>>>>>>>> */
>>>>>>>>>>> -  public List<List<Integer>> getTranslationWordAlignments() {
>>>>>>>>>>> -    return translationWordAlignments;
>>>>>>>>>>> +  public String getTranslationWordAlignments() {
>>>>>>>>>>> +    if (this.translationWordAlignments == null) {
>>>>>>>>>>> +      if (derivationRoot == null)
>>>>>>>>>>> +        this.translationWordAlignments = "";
>>>>>>>>>>> +      else {
>>>>>>>>>>> +        WordAlignmentExtractor wordAlignmentExtractor = new
>>>>>>>>> WordAlignmentExtractor();
>>>>>>>>>>> +        derivationRoot.visit(wordAlignmentExtractor);
>>>>>>>>>>> +        this.translationWordAlignments =
>>>>>>>>> wordAlignmentExtractor.toString();
>>>>>>>>>>> +      }
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +    return this.translationWordAlignments;
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> -  public Map<String,Float> getTranslationFeatures() {
>>>>>>>>>>> +  public FeatureVector getTranslationFeatures() {
>>>>>>>>>>> +    if (this.translationFeatures == null)
>>>>>>>>>>> +      this.translationFeatures =
>> derivationRoot.replayFeatures();
>>>>>>>>>>> +
>>>>>>>>>>> return translationFeatures;
>>>>>>>>>>> }
>>>>>>>>> 
>>>>>> 
>>>> 
>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/82431956/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>>>>>> 
>>>>>> ----------------------------------------------------------------------
>>>>>>>>>>> diff --git a/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>>>> b/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>>>>>> index 42539cc..ea6ca73 100644
>>>>>>>>>>> --- a/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>>>>>> +++ b/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>>>>>> @@ -34,6 +34,7 @@ import java.util.regex.Matcher;
>>>>>>>>>>> import joshua.corpus.Vocabulary;
>>>>>>>>>>> import joshua.decoder.BLEU;
>>>>>>>>>>> import joshua.decoder.JoshuaConfiguration;
>>>>>>>>>>> +import joshua.decoder.StructuredTranslation;
>>>>>>>>>>> import joshua.decoder.chart_parser.ComputeNodeResult;
>>>>>>>>>>> import joshua.decoder.ff.FeatureFunction;
>>>>>>>>>>> import joshua.decoder.ff.FeatureVector;
>>>>>>>>>>> @@ -167,33 +168,25 @@ public class KBestExtractor {
>>>>>>>>>>> // Determine the k-best hypotheses at each HGNode
>>>>>>>>>>> VirtualNode virtualNode = getVirtualNode(node);
>>>>>>>>>>> DerivationState derivationState =
>>>>>>>>> virtualNode.lazyKBestExtractOnNode(this, k);
>>>>>>>>>>> +
>>>>>>>>>>> //    DerivationState derivationState = getKthDerivation(node,
>> k);
>>>>>>>>>>> if (derivationState != null) {
>>>>>>>>>>> -      // ==== read the kbest from each hgnode and convert to
>>>> output
>>>>>>>>> format
>>>>>>>>>>> -      FeatureVector features = new FeatureVector();
>>>>>>>>>>> 
>>>>>>>>>>> -      /*
>>>>>>>>>>> -       * To save space, the decoder only stores the model cost,
>> no
>>>>>> the
>>>>>>>>> individual feature values. If
>>>>>>>>>>> -       * you want to output them, you have to replay them.
>>>>>>>>>>> -       */
>>>>>>>>>>> -      String hypothesis = null;
>>>>>>>>>>> -      if (joshuaConfiguration.outputFormat.contains("%f")
>>>>>>>>>>> -          || joshuaConfiguration.outputFormat.contains("%d"))
>>>>>>>>>>> -        features = derivationState.replayFeatures();
>>>>>>>>>>> -
>>>>>>>>>>> -      hypothesis = derivationState.getHypothesis()
>>>>>>>>>>> +      StructuredTranslation translation = new
>>>> StructuredTranslation(
>>>>>>>>>>> +          sentence, derivationState, joshuaConfiguration);
>>>>>>>>>>> +
>>>>>>>>>>> +      String hypothesis = translation.getTranslationString()
>>>>>>>>>>>     .replaceAll("-lsb-", "[")
>>>>>>>>>>>     .replaceAll("-rsb-", "]")
>>>>>>>>>>>     .replaceAll("-pipe-", "|");
>>>>>>>>>>> 
>>>>>>>>>>> -
>>>>>>>>>>> outputString = joshuaConfiguration.outputFormat
>>>>>>>>>>>     .replace("%k", Integer.toString(k))
>>>>>>>>>>>     .replace("%s", hypothesis)
>>>>>>>>>>>     .replace("%S", DeNormalize.processSingleLine(hypothesis))
>>>>>>>>>>>     .replace("%i", Integer.toString(sentence.id()))
>>>>>>>>>>> -          .replace("%f", joshuaConfiguration.moses ?
>>>>>>>>> features.mosesString() : features.toString())
>>>>>>>>>>> -          .replace("%c", String.format("%.3f",
>>>>>> derivationState.cost));
>>>>>>>>>>> +          .replace("%f", joshuaConfiguration.moses ?
>>>>>>>>> translation.getTranslationFeatures().mosesString() :
>>>>>>>>> translation.getTranslationFeatures().toString())
>>>>>>>>>>> +          .replace("%c", String.format("%.3f",
>>>>>>>>> translation.getTranslationScore()));
>>>>>>>>>>> 
>>>>>>>>>>> if (joshuaConfiguration.outputFormat.contains("%t")) {
>>>>>>>>>>>   outputString = outputString.replace("%t",
>>>>>>>>> derivationState.getTree());
>>>>>>>>>>> @@ -250,11 +243,11 @@ public class KBestExtractor {
>>>>>>>>>>> return;
>>>>>>>>>>> 
>>>>>>>>>>> for (int k = 1; k <= topN; k++) {
>>>>>>>>>>> -      String hypStr = getKthHyp(hg.goalNode, k);
>>>>>>>>>>> -      if (null == hypStr)
>>>>>>>>>>> +      String translation = getKthHyp(hg.goalNode, k);
>>>>>>>>>>> +      if (null == translation)
>>>>>>>>>>>   break;
>>>>>>>>>>> 
>>>>>>>>>>> -      out.write(hypStr);
>>>>>>>>>>> +      out.write(translation);
>>>>>>>>>>> out.write("\n");
>>>>>>>>>>> out.flush();
>>>>>>>>>>> }
>>>>>>>>>>> @@ -704,11 +697,11 @@ public class KBestExtractor {
>>>>>>>>>>> /**
>>>>>>>>>>> * Visits every state in the derivation in a depth-first order.
>>>>>>>>>>> */
>>>>>>>>>>> -    private DerivationVisitor visit(DerivationVisitor visitor) {
>>>>>>>>>>> +    public DerivationVisitor visit(DerivationVisitor visitor) {
>>>>>>>>>>> return visit(visitor, 0);
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> -    private DerivationVisitor visit(DerivationVisitor visitor,
>> int
>>>>>>>>> indent) {
>>>>>>>>>>> +    public DerivationVisitor visit(DerivationVisitor visitor,
>> int
>>>>>>>>> indent) {
>>>>>>>>>>> 
>>>>>>>>>>> visitor.before(this, indent);
>>>>>>>>>>> 
>>>>>>>>>>> @@ -733,25 +726,25 @@ public class KBestExtractor {
>>>>>>>>>>> return visitor;
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> -    private String getHypothesis() {
>>>>>>>>>>> +    public String getHypothesis() {
>>>>>>>>>>> return getHypothesis(defaultSide);
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> -    private String getTree() {
>>>>>>>>>>> +    public String getTree() {
>>>>>>>>>>> return visit(new TreeExtractor()).toString();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> -    private String getHypothesis(Side side) {
>>>>>>>>>>> +    public String getHypothesis(Side side) {
>>>>>>>>>>> return visit(new HypothesisExtractor(side)).toString();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> -    private FeatureVector replayFeatures() {
>>>>>>>>>>> +    public FeatureVector replayFeatures() {
>>>>>>>>>>> FeatureReplayer fp = new FeatureReplayer();
>>>>>>>>>>> visit(fp);
>>>>>>>>>>> return fp.getFeatures();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> -    private String getDerivation() {
>>>>>>>>>>> +    public String getDerivation() {
>>>>>>>>>>> return visit(new DerivationExtractor()).toString();
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> @@ -811,7 +804,7 @@ public class KBestExtractor {
>>>>>>>>>>> */
>>>>>>>>>>> void after(DerivationState state, int level);
>>>>>>>>>>> }
>>>>>>>>>>> -
>>>>>>>>>>> +
>>>>>>>>>>> /**
>>>>>>>>>>> * Extracts the hypothesis from the leaves of the tree using the
>>>>>>>>> generic (depth-first) visitor.
>>>>>>>>>>> * Since we're using the visitor, we can't just print out the
>> words
>>>> as
>>>>>>>>> we see them. We have to
>>>>>>>>>>> @@ -878,7 +871,7 @@ public class KBestExtractor {
>>>>>>>>>>> return outputs.pop().replaceAll("<s> ", "").replace(" </s>",
>> "");
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> -
>>>>>>>>>>> +
>>>>>>>>>>> /**
>>>>>>>>>>> * Assembles a Penn treebank format tree for a given derivation.
>>>>>>>>>>> */
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> Amazon Development Center Germany GmbH
>>>>>>> Berlin - Dresden - Aachen
>>>>>>> main office: Krausenstr. 38, 10117 Berlin
>>>>>>> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
>>>>>>> Ust-ID: DE289237879
>>>>>>> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: joshua_api

Reply via email to