[ 
https://issues.apache.org/jira/browse/SOLR-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068767#comment-16068767
 ] 

Hoss Man commented on SOLR-10123:
---------------------------------

(FWIW Houston, attaching patches showing your progress/attempts makes it easier 
for people to follow along with exactly what you're doing and offer meaningful 
ideas/suggestions)

bq. However the randomized doc-values cannot be used since docValues are 
required for almost all Analytics Component functionality.

That's fine -- if the feature requires docValues it requires docValues.  The 
main reasons the docValue randomization was added was:
* to help catch bugs/assumptions in code related to docValues
* so tests for things like facets (which work with non-dv tries, but require 
dv's for points) could do this...{code}
@BeforeClass
public static void beforeClass() throws Exception {
  // we need DVs on point fields to compute stats & facets
  if (Boolean.getBoolean(NUMERIC_POINTS_SYSPROP)) 
System.setProperty(NUMERIC_DOCVALUES_SYSPROP,"true");
{code}

bq. Almost all tests pass now, however there is a difference between 
SortedSetDocValues (TrieField) and SortedNumericDocValues (PointField) that 
might make this impossible. ...

What you're talking about is noted in SOLR-10924.  Personally i consider it a 
feature of Points fields.  

How we deal with it depends largely on what folks think the "right" behavior is 
and how it should be documented.  From an end user standpoint i think it's 
*great* -- they'll have an accurate statistical representation of the data they 
put in, and if they don't wnat duplicate values considered they shouldn't put 
the dups in. (ie: document it as a limitation of using Trie numerics, not a 
"bug" in Points)

How it affects the tests and what should be done there is a harder question 
because I have no idea how much this impacts the existing tests with your 
current working changes.

One approach is to leave the test data in place, leave the duplicate values in 
place, and account for the discrepancy in the assertions -- ala 
TestExportWriter.testDuplicates()

A diff approach would be to change the tests to ensure it didn't use duplicates 
in it's tests data, so the numbers are equivalent regardless of the underlying 
implementation.

A third option, is to eliminate the points randomization completley -- i 
wouldn't advise this unless tthe other options are for some reason completley 
impossible -- and systematically test both Trie fields and Point fields with 
diff tests that know about the diff behavior.

But as things stand right now, this jira claims the new code works with Point 
fields, but this claim is not backed up by any new testing, so _something_ 
needs to change.





> Analytics Component 2.0
> -----------------------
>
>                 Key: SOLR-10123
>                 URL: https://issues.apache.org/jira/browse/SOLR-10123
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Houston Putman
>              Labels: features
>         Attachments: SOLR-10123.patch, SOLR-10123.patch, SOLR-10123.patch
>
>
> A completely redesigned Analytics Component, introducing the following 
> features:
> * Support for distributed collections
> * New JSON request language, and response format that fits JSON better.
> * Faceting over mapping functions in addition to fields (Value Faceting)
> * PivotFaceting with ValueFacets
> * More advanced facet sorting
> * Support for PointField types
> * Expressions over multi-valued fields
> * New types of mapping functions
> ** Logical
> ** Conditional
> ** Comparison
> * Concurrent request execution
> * Custom user functions, defined within the request
> Fully backwards compatible with the orifinal Analytics Component with the 
> following exceptions:
> * All fields used must have doc-values enabled
> * Expression results can no longer be used when defining Range and Query 
> facets
> * The reverse(string) mapping function is no longer a native function



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to