Compressing docValues with variable length bytes[] by block of 16k ?

Olivier Binda Sat, 08 Aug 2015 07:20:02 -0700

Greetings

are there any plans to implement compression of the variable lengthbites[] binary doc Values,

say in blocks of 16k like for stored values ?


my .cfs file goes from 2MB to like 400k when I zip it

Best regards,
Olivier



On 08/08/2015 02:32 PM, jamie wrote:

Greetings
Our app primarily uses Lucene for its intended purpose i.e. to searchacross large amounts of unstructured text. However, recently ourrequirement expanded to perform look-ups on specific documents in theindex based on associated custom defined unique keys. For ourpurposes, a unique key is the string representation of a 128 bitmurmur hash, stored in a Lucene field named uid. We are currentlyusing the TermsFilter to lookup Documents in the Lucene index as follows:
List<Term> terms = new LinkedList<>();
            for (String id : ids) {
                terms.add(new Term("uid", id));
}
TermsFilter idFilter = new TermsFilter(terms);
... search logic...
At any time we may need to lookup say a couple of thousand documents.Our problem is one of performance. On very large indexes with 30million records or more, the lookup can be excruciatingly slow. Atthis stage, its not practical for us to move the data over to fit forpurpose database, nor change the uid field to a numeric type. I fullyappreciate the fact that Lucene is not designed to be a database,however, is there anything we can do to improve the performance ofthese look-ups?
Much appreciate

Jamie



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Compressing docValues with variable length bytes[] by block of 16k ?

Reply via email to