[
https://issues.apache.org/jira/browse/JOSHUA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Post updated JOSHUA-289:
-----------------------------
Fix Version/s: (was: 6.1)
6.2
> Fix output formatting
> ---------------------
>
> Key: JOSHUA-289
> URL: https://issues.apache.org/jira/browse/JOSHUA-289
> Project: Joshua
> Issue Type: Improvement
> Reporter: Matt Post
> Assignee: Matt Post
> Fix For: 6.2
>
>
> This is a sub ticket of JOSHUA-273.
> Joshua output formatting is a mess. The StructuredTranslation piece is a good
> step in the right direction, but many problems remain. Here is a list of
> problems and corrections.
> - There are currently four variables that contribute to defining separate
> paths for formatting the output: server mode (two different types) or regular
> mode, whether use_structured_translations is set, whether topN == 0 (i.e.,
> whether we are outputting k-best or just quick viterbi best), and whether we
> are doing projecting case or doing denormalization of the output.
> - In TCP mode, ServerThread.java.run() iterates over Translation objects
> returned by Translations. Translation.toString() is then called. %S and
> recasing are applied.
> - In HTTP mode, ServerThread.java.handle() builds a JSONMessage, which in
> turn calls
> translation.getStructuredTranslations.get(0).getTranslationString(). No
> recasing or %S formatting are applied.
> - In regular mode, we call Translation.toString(), which formats output in a
> complicated way in the constructor, using different methods depending on
> whether (a) use_structured_translations is set (b) topN == 0. This is a
> veritable mess of nested redundant output formatting. Some of these in turn
> use separate formatting applied in KBestExtractor's constructor.
> Suggestions:
> - Get rid of topN==0. Viterbi extraction should be quicker than k-best and is
> used automatically if possible. The same output formatting should apply in
> either case.
> - We should always use structured outputs, even collapsing
> StructuredTranslation into Translation
> - Move all output formatting out of KBestExtractor. This should just return
> k-best items.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)