[ https://issues.apache.org/jira/browse/LUCENE-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mikhail Khludnev updated LUCENE-7863: ------------------------------------- Attachment: LUCENE-7863.patch [^LUCENE-7863.patch] replaces TreeMap<String> to BytesRefArray see {{ByteArrayDerivativeWriter.java}}. Here are results for 5M docs |round|indexing, mins|search req/sec|ram total, GB |index size, GB| | EdgeNGramm |85|27.82|2.3|23| |derived edges|51|7.22|5.5|9.1| We have index size and even index time gain that costs some ram as it's expected. EdgeNGramm cache can be made a little bit more compact. The trick is to append something to edgegramm to make it unique. The interesting thing is the 3 times slower search time, I suppose that posting offsets obtained during term expansion could be sorted before reading postings. > Don't repeat postings (and perhaps positions) on ReverseWF, EdgeNGram, etc > ---------------------------------------------------------------------------- > > Key: LUCENE-7863 > URL: https://issues.apache.org/jira/browse/LUCENE-7863 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Mikhail Khludnev > Attachments: benchmark-1m.out, LUCENE-7863.hazard, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, > LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch, LUCENE-7863.patch > > > h2. Context > \*suffix and \*infix\* searches on large indexes. > h2. Problem > Obviously applying {{ReversedWildcardFilter}} doubles an index size, and I'm > shuddering to think about EdgeNGrams... > h2. Proposal > _DRY_ -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org