I'm pretty confused about points as well and until very recently thought
these we geo-spacial improvements only.

It would be good to understand the mechanics of points versus numerics. I'm
particularly interested in not losing the high performance numeric
DocValues support, which has become so important for analytics.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Mar 24, 2016 at 11:37 AM, David Smiley <[email protected]>
wrote:

> bq. it wasn't at all clear that the intention was that simple scalars
> would now and forever henceforth be referred to as "points". My impression
> at the time was that the focus of the Jira was on implementation and
> storage level indexing detail rather than the user-facing API level. I see
> now that I was wrong about that. It just seems to me that there should have
> been a more direct public discussion of eliminating the concept of scalar
> values at the API level.
>
> I knew because I was following closely, but otherwise I agree with your
> sentiment.  I don't love the "PointValues" terminology either nor did I
> like "DimensionalValues"; I should have suggested alternatives at the time
> but the Mike & Rob tag-team were working so fast that I didn't interject in
> the narrow window of time before a patch was put up with the current
> names.  More time to publicly discuss would have been better.  FWIW I like
> your suggestion for "Scalar"; that's more meaningful to me.  Naming is hard.
>
> ~ David
>
> On Thu, Mar 24, 2016 at 11:28 AM Jack Krupansky <[email protected]>
> wrote:
>
>> I wasn't paying close attention when this whole PointValues saga was
>> unfolding. I get the value of points for spatial data, but conflating the
>> terms "point" and "numeric" is bizarre to say the least. Reading the code,
>> I see "Points represent numeric values", which seems nonsensical to me. A
>> little later the code comment says "Geospatial Point Types - Although basic
>> point types such as DoublePoint support points in multi-dimensional space
>> too, Lucene has specialized classes for location data...", which continues
>> this odd use of terminology. I mean, aren't all points spatial by
>> definition, so that "Geospatial Point" is redundant? It would make more
>> sense to speak of a point as a geospatial number, or that a point is
>> represented by numbers.
>>
>> IOW, NumericValues would make more sense as the base, with (spatial)
>> PointValues derived from the base of numeric values. At least to me that
>> would make more sense.
>>
>> As the PointValues was progressing I had no idea that its intent was to
>> subsume, replace, or deprecate traditional scalar numeric value support in
>> Lucene (or Solr.) It came across primarily as being an improvement for
>> spatial search.
>>
>> Not that I have any objection to greatly improved storage in Lucene, but
>> to now have to speak of all numeric data as points seems quite... weird.
>>
>> Sure, I saw the Jira traffic, like LUCENE-6825 (Add multidimensional
>> byte[] indexing support to Lucene) and LUCENE-6852 (Add DimensionalFormat
>> to Codec), but in all honesty that really did come across as relating to
>> purely spatial data and not being applicable to basic scalar number support.
>>
>> Looking at CHANGES.TXT, I see references like "LUCENE-6852, LUCENE-6975:
>> Add support for points (dimensionally indexed values)", but without any
>> hint that the intent was to subsume or replace non-dimensional numeric
>> indexed values.
>>
>> Now for all I know, non-dimensional (scalar) numeric data can very
>> efficiently be handled as if it had dimension, but that's not exactly
>> obvious and warrants at least some illumination. In traditional terminology
>> a point is 0-dimension (a line is 1-dimension, and a plane is 2-dimension),
>> but traditionally a raw number - a scalar - hasn't been referred to as
>> having dimension, so that is a new concept warranting clear definition.
>>
>> Yeah, I do recall seeing LUCENE-6917 (Deprecate and rename
>> NumericField/RangeQuery to LegacyNumeric) go by in the Jira traffic, and
>> shame on me for not reading the details more carefully, but it wasn't at
>> all clear that the intention was that simple scalars would now and forever
>> henceforth be referred to as "points". My impression at the time was that
>> the focus of the Jira was on implementation and storage level indexing
>> detail rather than the user-facing API level. I see now that I was wrong
>> about that. It just seems to me that there should have been a more direct
>> public discussion of eliminating the concept of scalar values at the API
>> level.
>>
>> (I wonder what physics would be like if they started referring to scalar
>> quantities as vectors.)
>>
>> My apologies for the rant.
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Mar 24, 2016 at 10:34 AM, David Smiley <[email protected]>
>> wrote:
>>
>>> With the move to PointValues and away from trie based indexing of the
>>> terms index, for numerics, everything associated with the trie stuff seems
>>> to be labelled as "Legacy" and marked deprecated.  Even
>>> FieldType.NumericType (now FieldType.LegacyNumericType) -- a simple enum of
>>> INT, LONG, FLOAT, DOUBLE.  I wonder if we ought to reconsider doing this
>>> for FieldType.NumericType, as it articulates the type of numeric data; it
>>> need not be associated with just trie indexing of terms data; it could
>>> articulate how any numeric data is encoded, be it docValues or
>>> pointValues.  This is useful metadata.  It's not strictly required, true,
>>> but its useful in describing what goes in the field.  This makes a
>>> FieldType instance fairly self-sufficient.  Otherwise, say you have
>>> docValue numerics and/or pointValues, it's ambiguous how the data should be
>>> interpreted.  This doesn't lead to a bug but would help debugging and
>>> allowing APIs to express field requirements simply by providing a FieldType
>>> instance for numeric data.  It used to be self sufficient but now if we
>>> imagine the legacy stuff being removed, it's ambiguous.  In addition, it
>>> would be useful metadata if it found it's way into FieldInfo.  Then, say
>>> Luke, could help you know what's there and maybe search it.
>>>
>>> Thoughts?
>>>
>>> ~ David
>>> --
>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> http://www.solrenterprisesearchserver.com
>>>
>>
>> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>

Reply via email to