[
https://issues.apache.org/jira/browse/JOSHUA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Post resolved JOSHUA-272.
------------------------------
Resolution: Fixed
Fixed with [a recent
comment|https://github.com/apache/incubator-joshua/commit/aef0b2dbe4555070aec9f15bb2c8d9dcb5671dcd].
> Simplify the packing and usage of phrase-based grammars
> -------------------------------------------------------
>
> Key: JOSHUA-272
> URL: https://issues.apache.org/jira/browse/JOSHUA-272
> Project: Joshua
> Issue Type: Improvement
> Reporter: Matt Post
> Assignee: Matt Post
> Fix For: 6.1
>
>
> For historical reasons, phrase-based grammars add some complexity to
> decoding. The complete tree under each top-level trie node in packed grammars
> has to fit within a single packed grammars slice, which is limited to 2 GB
> due to constraints on the size of Java byte[] arrays. We used to sort on just
> the first item in the trie, which was a problem for phrase-based decoding,
> since phrase-based rules are implemented as left-branching hierarchical
> rules. In order to pack large grammars, we packed them without the leading
> [X,1], and then added it when loading the grammars, both for the packed and
> memory-based grammars. This was a real mess.
> This was all fixed with a commit a while ago that packs and reads packed
> grammars based on the first two symbols on the source side. So we should
> remove all the complexity associated with phrases. They should just be
> regular rules. There is also a lot of redundancy across the codebase in
> parsing rules, converting them to different formats, and so on.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)