[ https://issues.apache.org/jira/browse/LUCENE-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081571#comment-13081571 ]
Dawid Weiss commented on LUCENE-3297: ------------------------------------- I guess I just don't like marker values from the domain range (like END_LABEL)... they make me nervous. I'll experiment. > FST doesn't fully share common prefix across all outputs > -------------------------------------------------------- > > Key: LUCENE-3297 > URL: https://issues.apache.org/jira/browse/LUCENE-3297 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs > Reporter: Michael McCandless > Priority: Minor > > FST will try to share prefixes of outputs when possible, however in the [I > think unusual in practice] case where all outputs share a common prefix, FST > really ought to store this just once, on the root arc, but instead it's only > able to push back to the N root arcs. It's sort of an off-by-one on how far > back the pushing goes... > One [synthetic] example where this makes a big difference is the new > Test2BPostings test, when it uses MemoryCodec, because this test has 26 terms > (letters of alphabet) and each term has exactly the same long (~85 MB) all 1s > byte[] as the postings. If we fixed this issue, then the resulting FST would > only be ~85 MB but now instead it needs to be ~85 * 26 MB. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org