[jira] [Commented] (SOLR-6386) make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic

Erick Erickson (JIRA) Mon, 08 Sep 2014 09:01:46 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125694#comment-14125694
 ]


Erick Erickson commented on SOLR-6386:
--------------------------------------

[[email protected]] Some things I found out this weekend:
[[email protected]] Pinging you on this because I half suspect that 
there's something weird with the test infrastructure.

Frankly I'm at a loss, but here's the outstanding things I saw. I'm pretty sure 
my question of whether this would "just get taken care of" by the stuff I'm 
doing for SOLR-6187 is "no", so I'm assigning it back to nobody. Adding the 
facet.limit=1 in the test makes the problem disappear just b/c all the bogus 0 
counts that get returned are removed.

> If I optimize the clients and control server in 
> BaseDistributedSearchTestCase.commit, then this test case does NOT fail. But 
> I must optimize both. If I just optimize the control, it fails. If I just 
> optimize the clients it fails. This really weirds me out. I suspected pilot 
> error here frankly, so I just tried it again and I'm pretty sure I'm not 
> hallucinating. I'd expect optimizing the distributed case would fix this up 
> but nooooo. So I wonder if there's something weird here with RAMDirectory 
> which underpins the servers.... Although just for yucks I tried using a 
> disk-based directory and it still seemed to fail although I won't swear that 
> I got it right.

> I set up IntelliJ with the seeds etc. you provided and it's not until the 
> third pass that it fails. But it fails every time on the third pass. Ditto 
> with running the test from the command shell.

> in DocValuesFacet.getCount, around line 200 or so I'm printing out the values 
> added. This is near the bottom of the clause:
if (sort.equals(FacetParams.FACET_SORT_COUNT) || 
sort.equals(FacetParams.FACET_SORT_COUNT_LEGACY)) {
... near the end
} else...

On the pass that fails, I get these values:
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-04-20T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T10:59:56.032Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T10:57:12.192Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-02T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T07:10:00.704Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-05T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-04-27T16:01:01.44Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 2009-03-13T13:23:01.248Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0
   [junit4]   1> QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0

Notice the Jan-1, 1970. dates. Sure seems like a zero snuck in there somewhere. 
If you sum up the non-zero counts, you wind up with the right facet counts.

On the pass that's optimized, I get this on the third pass which is consistent 
with what the control server gives back, thus it passes.:
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-04-20T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-02T11:00:00Z 1
   [junit4]   1> QUERY DUMP 1 Adding string/count 2010-05-05T11:00:00Z 1

Anyway, this is beyond what I want to deal with just now. Let me know if 
there's anything else I can provide. 


> make secondary ordering of facet.field values (and facet.pivot?) consistently 
> deterministic
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6386
>                 URL: https://issues.apache.org/jira/browse/SOLR-6386
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Hoss Man
>            Assignee: Erick Erickson
>
> as a fluke of how the SOLR-2894 patch evolved, it wound up adding a bit of 
> testing of distributed facet.field on date fields (see [r1617789 changes to 
> TestDistributedSearch|https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test/org/apache/solr/TestDistributedSearch.java?r1=1617789&r2=1617788&pathrev=1617789])
>  ... but this started triggering some random failures due to facet 
> constraints with identical values being sorted differently between the 
> distributed query and the single node control query.
> We should make the facet.field (and facet.pivot) code order constraints with 
> tied counts consistently regardless of whether it's a distrib search or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6386) make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic

Reply via email to