[ https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924037#comment-13924037 ]
Rob Audenaerde commented on LUCENE-5476: ---------------------------------------- {quote} ...Given our test framework, randomness is not a big deal at all, since once we get a test failure, we can deterministically reproduce the failure (when there is no multi-threading)... {quote} Ok, this makes sense to me. {quote} It looks like it hasn't changed? I mean besides the rename. So if I set sampleSize=100K, it's 100K whether there are 101K docs or 100M docs, right? Is that your intention? {quote} Correct, it is my intention. I actually prefer not to increase the {{sampleSize}} with more hits, as bigger samples are slower and 100K is a nice sample size anyway and more hits means more time. I adjust the sampleRatio so that the resulting set of documents is (close to) the {{sampleSize}}. {quote} I find this assert just redundant – if we always expect 5, we shouldn't assert that we received 5. If we say that very infrequently we might get <5 and we're OK with it .. what's the point of asserting that at all? {quote} Agreed with the <5. Asserting seems redundant, but is that not the point in unit-tests? The trick is that the assertion should still hold if you change the implementation.. I will add more next week. Btw. Is there an easy way to retrieve the total facet counts for a ordinal? When correcting facet counts it would a quick win to limit the number of estimated documents to the actual number of documents in the index that match that facet. (And maybe use the distribution as well, to make better estimates) > Facet sampling > -------------- > > Key: LUCENE-5476 > URL: https://issues.apache.org/jira/browse/LUCENE-5476 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Rob Audenaerde > Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, > LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, > SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java > > > With LUCENE-5339 facet sampling disappeared. > When trying to display facet counts on large datasets (>10M documents) > counting facets is rather expensive, as all the hits are collected and > processed. > Sampling greatly reduced this and thus provided a nice speedup. Could it be > brought back? -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org