It is actually possible in lucene 4, but there is nothing really convenient setup to do this.
You have two choices there: 1. trigger a massive merge (essentially an optimize), by wrapping all readers and calling IndexWriter.addIndexes(Reader...). 2. wrap readers in a custom merge policy and do it slowly over time. in both cases you'd use something like http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/index/FieldFilterAtomicReader.java for lucene 3, this would be more complicated, I don't think its impossible but there is no available code unfortunately in this case. On Mon, Mar 31, 2014 at 11:37 PM, Paul Smith <[email protected]> wrote: > ok, this is more low level Lucene, but in the context of an ElasticSearch > cluster, is there any way to get an index/shard to optimize away a bunch of > fields that are no longer used (literally have no term values associated > with it. > > We had an application bug introduced that polluted an index with a very > large number of fields (25,000 fields... *cough*) , and lets just say things > weren't well after that. > > we've deleted all the rogue records, but the shards still contain the raw > Lucene Field information (we've inspected these with Luke) and the cluster > is heavily CPU bound processing "refreshVersionTable" calls that is in a > large loop a function of the number of fields in the segments. > > We've attempted a test optimize of the index using Luke on a single shard, > but the residual segments post-optimize still contain a large number of > these fields, all with no values associated with them. > > Obviously a reindex would do this, but if there's any other bright ideas > that are quicker than that (45 million item index we're trying to keep up) > would be most welcome! > > We're on ES 0.19.10 still (lucene 3.6.1). (you can tell me "upgrade" > another day please..) > > Here's a snapshot picture from the Luke on a single shard from this index. > > cheers! > > Paul Smith > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAHfYWB5nO%3DDQ50SQ4kgde6JvT%3DgjQ_7FmLbVcXVk5Kiurwme%2Bg%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZXZNf2y7AXsJFJg7hBOyJmEW%2BOvcNZse1JfQx0XcFyynA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
