tsuraan wrote:
Sounds interesting. Can you tell us a bit more about the use case
for it?
Is it basically you are in a situation where you can't unzip the
index?
Indices compress pretty nicely: 30% to 50% in my experience. So, if
youre
indices are read-only anyhow (mine aren't live; we do batch jobs to
modify
them, so they're mostly read-only), they might as well be stored
compressed
to save on disk usage. Sometimes on-disk compression of files (in
general)
can help throughput, since the drive IO tends to be a bottleneck
rather than
the CPU load; I don't know whether that's true of zipped lucene
indices
though.
Also, have you looked at how it performs?
No, I'm not sure how to do this; what are good benchmarks of store
performance? Write speed tends to be a significant thing to test,
but my
ZipDirectory doesn't support writing. What other operations tend to
be
commonly done in searching? I could create an IndexReader and call
document
and getTermFreqVectors for each doc in my reader. Is that a useful
test, or
is there some established body of useful measures on a store?
You could use contrib/benchmark.
I think query performance, for simple term queries, AND, OR, phrase,
etc., would be interesting.
It sounds like the model is, you use a normal Lucene directory to
create the index, then you zip it up, at which point you can then use
ZipDirectory to search it.
I think this would make a great contribution -- any chance you could
package it up and attach a patch to a new Jira issue?
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org