tsuraan wrote:

Sounds interesting. Can you tell us a bit more about the use case for it?
Is it basically you are in a situation where you can't unzip the index?

Indices compress pretty nicely: 30% to 50% in my experience. So, if youre indices are read-only anyhow (mine aren't live; we do batch jobs to modify them, so they're mostly read-only), they might as well be stored compressed to save on disk usage. Sometimes on-disk compression of files (in general) can help throughput, since the drive IO tends to be a bottleneck rather than the CPU load; I don't know whether that's true of zipped lucene indices
though.

Also, have you looked at how it performs?

No, I'm not sure how to do this; what are good benchmarks of store
performance? Write speed tends to be a significant thing to test, but my ZipDirectory doesn't support writing. What other operations tend to be commonly done in searching? I could create an IndexReader and call document and getTermFreqVectors for each doc in my reader. Is that a useful test, or
is there some established body of useful measures on a store?

You could use contrib/benchmark.

I think query performance, for simple term queries, AND, OR, phrase, etc., would be interesting.

It sounds like the model is, you use a normal Lucene directory to create the index, then you zip it up, at which point you can then use ZipDirectory to search it.

I think this would make a great contribution -- any chance you could package it up and attach a patch to a new Jira issue?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to