I wasn't paying close attention when this whole PointValues saga was
unfolding. I get the value of points for spatial data, but conflating the
terms "point" and "numeric" is bizarre to say the least. Reading the code,
I see "Points represent numeric values", which seems nonsensical to me. A
little later the code comment says "Geospatial Point Types - Although basic
point types such as DoublePoint support points in multi-dimensional space
too, Lucene has specialized classes for location data...", which continues
this odd use of terminology. I mean, aren't all points spatial by
definition, so that "Geospatial Point" is redundant? It would make more
sense to speak of a point as a geospatial number, or that a point is
represented by numbers.

IOW, NumericValues would make more sense as the base, with (spatial)
PointValues derived from the base of numeric values. At least to me that
would make more sense.

As the PointValues was progressing I had no idea that its intent was to
subsume, replace, or deprecate traditional scalar numeric value support in
Lucene (or Solr.) It came across primarily as being an improvement for
spatial search.

Not that I have any objection to greatly improved storage in Lucene, but to
now have to speak of all numeric data as points seems quite... weird.

Sure, I saw the Jira traffic, like LUCENE-6825 (Add multidimensional byte[]
indexing support to Lucene) and LUCENE-6852 (Add DimensionalFormat to
Codec), but in all honesty that really did come across as relating to
purely spatial data and not being applicable to basic scalar number support.

Looking at CHANGES.TXT, I see references like "LUCENE-6852, LUCENE-6975:
Add support for points (dimensionally indexed values)", but without any
hint that the intent was to subsume or replace non-dimensional numeric
indexed values.

Now for all I know, non-dimensional (scalar) numeric data can very
efficiently be handled as if it had dimension, but that's not exactly
obvious and warrants at least some illumination. In traditional terminology
a point is 0-dimension (a line is 1-dimension, and a plane is 2-dimension),
but traditionally a raw number - a scalar - hasn't been referred to as
having dimension, so that is a new concept warranting clear definition.

Yeah, I do recall seeing LUCENE-6917 (Deprecate and rename
NumericField/RangeQuery to LegacyNumeric) go by in the Jira traffic, and
shame on me for not reading the details more carefully, but it wasn't at
all clear that the intention was that simple scalars would now and forever
henceforth be referred to as "points". My impression at the time was that
the focus of the Jira was on implementation and storage level indexing
detail rather than the user-facing API level. I see now that I was wrong
about that. It just seems to me that there should have been a more direct
public discussion of eliminating the concept of scalar values at the API
level.

(I wonder what physics would be like if they started referring to scalar
quantities as vectors.)

My apologies for the rant.


-- Jack Krupansky

On Thu, Mar 24, 2016 at 10:34 AM, David Smiley <[email protected]>
wrote:

> With the move to PointValues and away from trie based indexing of the
> terms index, for numerics, everything associated with the trie stuff seems
> to be labelled as "Legacy" and marked deprecated.  Even
> FieldType.NumericType (now FieldType.LegacyNumericType) -- a simple enum of
> INT, LONG, FLOAT, DOUBLE.  I wonder if we ought to reconsider doing this
> for FieldType.NumericType, as it articulates the type of numeric data; it
> need not be associated with just trie indexing of terms data; it could
> articulate how any numeric data is encoded, be it docValues or
> pointValues.  This is useful metadata.  It's not strictly required, true,
> but its useful in describing what goes in the field.  This makes a
> FieldType instance fairly self-sufficient.  Otherwise, say you have
> docValue numerics and/or pointValues, it's ambiguous how the data should be
> interpreted.  This doesn't lead to a bug but would help debugging and
> allowing APIs to express field requirements simply by providing a FieldType
> instance for numeric data.  It used to be self sufficient but now if we
> imagine the legacy stuff being removed, it's ambiguous.  In addition, it
> would be useful metadata if it found it's way into FieldInfo.  Then, say
> Luke, could help you know what's there and maybe search it.
>
> Thoughts?
>
> ~ David
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>

Reply via email to