Matt Post created JOSHUA-272:
--------------------------------

             Summary: Simplify the packing and usage of phrase-based grammars
                 Key: JOSHUA-272
                 URL: https://issues.apache.org/jira/browse/JOSHUA-272
             Project: Joshua
          Issue Type: Improvement
            Reporter: Matt Post
            Assignee: Matt Post
             Fix For: 6.1


For historical reasons, phrase-based grammars add some complexity to decoding. 
The complete tree under each top-level trie node in packed grammars has to fit 
within a single packed grammars slice, which is limited to 2 GB due to 
constraints on the size of Java byte[] arrays. We used to sort on just the 
first item in the trie, which was a problem for phrase-based decoding, since 
phrase-based rules are implemented as left-branching hierarchical rules. In 
order to pack large grammars, we packed them without the leading [X,1], and 
then added it when loading the grammars, both for the packed and memory-based 
grammars. This was a real mess.

This was all fixed with a commit a while ago that packs and reads packed 
grammars based on the first two symbols on the source side. So we should remove 
all the complexity associated with phrases. They should just be regular rules. 
There is also a lot of redundancy across the codebase in parsing rules, 
converting them to different formats, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to