easyice commented on PR #12748:
URL: https://github.com/apache/lucene/pull/12748#issuecomment-1792322904
@mikemccand Thanks for the benchmarking, i also write 10 million docs of
random long values, then use `TermInSetQuery` for benchmarking. here is the
result:
The file size of tip reduced ~2%
| | size |
| --- | --- |
| main | 1807149 |
| PR | 1770259 |
The query latency reduced ~7%. `termsCount` is the number of terms in
`TermInSetQuery`, `hitRatio` refers to what percentage of the term will be hit.
there is a bit of variance across runs, but they seem good overall.
| hitRatio | termsCount | tookMs(main) | tookMs(PR) | diff |
| --- | --- | --- | --- | --- |
| 1% | 64 | 177 | 164 | 92.66% |
| 1% | 512 | 1380 | 1312 | 95.07% |
| 1% | 2048 | 5225 | 5022 | 96.11% |
| 25% | 64 | 222 | 212 | 95.50% |
| 25% | 512 | 1462 | 1391 | 95.14% |
| 25% | 2048 | 5602 | 5533 | 98.77% |
| 50% | 64 | 216 | 204 | 94.44% |
| 50% | 512 | 1600 | 1513 | 94.56% |
| 50% | 2048 | 6193 | 5883 | 94.99% |
| 75% | 64 | 224 | 213 | 95.09% |
| 75% | 512 | 1702 | 1598 | 93.89% |
| 75% | 2048 | 6565 | 6289 | 95.80% |
| 100% | 64 | 233 | 218 | 93.56% |
| 100% | 512 | 1752 | 1736 | 99.09% |
| 100% | 2048 | 7057 | 6621 | 93.82% |
crude benchmark code:
```
static public long doSearch(int termCount, int hitRatio) throws IOException {
Directory directory =
FSDirectory.open(Paths.get("/Volumes/RamDisk/longdata"));
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(indexReader);
searcher.setQueryCachingPolicy(
new QueryCachingPolicy() {
@Override
public void onUse(Query query) {
}
@Override
public boolean shouldCache(Query query) throws
IOException {
return false;
}
});
long total = 0;
Query query = getQuery(termCount, hitRatio);
for (int i = 0; i < 1000; i++) {
long start = System.currentTimeMillis();
doQuery(searcher, query);
long end = System.currentTimeMillis();
total += end - start;
}
//System.out.println("term count: " + termCount + ", took(ms): " +
total);
indexReader.close();
directory.close();
return total;
}
private static Query getQuery(int termCount, int hitRatio) {
int hitCount = termCount * hitRatio / 100;
int notHitCount = termCount - hitCount;
List<BytesRef> terms = new ArrayList<>();
for (int i = 0; i < hitCount; i++) {
terms.add(new
BytesRef(Long.toString(longs.get(RANDOM.nextInt(longs.size() - 1)))));
}
Random r = new Random();
for (int i = 0; i < notHitCount; i++) {
long v = r.nextLong();
while (uniqueLongs.contains(v)) {
v = r.nextLong();
}
terms.add(new BytesRef(Long.toString(v)));
}
return new TermInSetQuery(FIELD, terms);
}
private static void doQuery(IndexSearcher searcher, Query query) throws
IOException {
searcher.search(
query,
new Collector() {
@Override
public LeafCollector getLeafCollector(LeafReaderContext
context) throws IOException {
return new LeafCollector() {
@Override
public void setScorer(Scorable scorer) throws
IOException {
}
@Override
public void collect(int doc) throws IOException {
throw new CollectionTerminatedException();
}
};
}
@Override
public ScoreMode scoreMode() {
return ScoreMode.COMPLETE_NO_SCORES;
}
});
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]