[ 
https://issues.apache.org/jira/browse/JOSHUA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471376#comment-15471376
 ] 

Matt Post commented on JOSHUA-289:
----------------------------------

This is a larger sub-issue, and I've decided to move it to the 7 release.

> Fix output formatting
> ---------------------
>
>                 Key: JOSHUA-289
>                 URL: https://issues.apache.org/jira/browse/JOSHUA-289
>             Project: Joshua
>          Issue Type: Improvement
>            Reporter: Matt Post
>            Assignee: Matt Post
>             Fix For: 6.2
>
>
> This is a sub ticket of JOSHUA-273.
> Joshua output formatting is a mess. The StructuredTranslation piece is a good 
> step in the right direction, but many problems remain. Here is a list of 
> problems and corrections.
> - There are currently four variables that contribute to defining separate 
> paths for formatting the output: server mode (two different types) or regular 
> mode, whether use_structured_translations is set, whether topN == 0 (i.e., 
> whether we are outputting k-best or just quick viterbi best), and whether we 
> are doing projecting case or doing denormalization of the output.
> - In TCP mode, ServerThread.java.run() iterates over Translation objects 
> returned by Translations. Translation.toString() is then called. %S and 
> recasing are applied.
> - In HTTP mode, ServerThread.java.handle() builds a JSONMessage, which in 
> turn calls 
> translation.getStructuredTranslations.get(0).getTranslationString(). No 
> recasing or %S formatting are applied.
> - In regular mode, we call Translation.toString(), which formats output in a 
> complicated way in the constructor, using different methods depending on 
> whether (a) use_structured_translations is set (b) topN == 0. This is a 
> veritable mess of nested redundant output formatting. Some of these in turn 
> use separate formatting applied in KBestExtractor's constructor.
> Suggestions:
> - Get rid of topN==0. Viterbi extraction should be quicker than k-best and is 
> used automatically if possible. The same output formatting should apply in 
> either case.
> - We should always use structured outputs, even collapsing 
> StructuredTranslation into Translation
> - Move all output formatting out of KBestExtractor. This should just return 
> k-best items.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to