> > But adding a new type should be the last resort.
I did not realize that was the case, that's good to know. It seems like I should just use BDV (which does make the code change easier/faster so I have no issues with it). As for Patrick's suggestion of using separate numeric fields instead of packing them together, that actually does sound like an interesting idea, I think the biggest issue with it though would be implementing a multivalued version of this. As Robert pointed out, we would need an UnsortedNumericDV. Thanks for all the feedback! On Wed, May 25, 2022 at 8:17 AM Robert Muir <rcm...@gmail.com> wrote: > On Wed, May 25, 2022 at 12:17 AM Greg Miller <gsmil...@gmail.com> wrote: > > > > A "two separate field approach" would > > consist of indexing year and make separately, and you'd lose the > > information that only certain combinations are valid. Am I overlooking > > something with your suggestion? Maybe there's something we can do with > > Lucene already that solves for this case and I'm just not aware of it? > > That's entirely possible and I'd love to learn more if there is! > > This makes no sense to me. If there are two dimensions, there's no > difference in faceting code calling fieldA.value and fieldB.value, > than calling field.valueA and field.valueB. > > In other words, doesn't make any sense to needlessly "pack dimensions > together" at docvalues level, especially for what should be a > column-stride field. There's really no difference from the app > perspective. Any issues you have here seem to be issues around facet > module and not docvalues... > > > > > As for MultiRangeQuery and the mention of sandbox modules, I think > > that's a bit of a different use-case. MultiRangeQuery lets you filter > > by a disjunction of ranges. The "multi" part doesn't relate to > > "multiple values in a doc" (but it does support that, as do the > > "standard" range queries). > > > > Where I see a gap right now, beyond just faceting, is that we can > > represent N-dim points in the points index and filter on them (using > > the points index), but we have no doc values equivalent. This means, > > 1) we can't facet, and 2) we can't create a "slow" query that does > > post-filtering instead of using the points index (which could be a > > very real advantage in cases with a sparse match set but a dense > > points index). So I like the idea of creating that concept and being > > able to facet and filter on it. Whether-or-not this is a "formal" doc > > values type or sits on top of BDV, I have less of a strong opinion. > > We shouldn't add new docvalues types because of "slow queries", I'm > really against that. The root problem is that points impl can't filter > well (like the inverted index can), and as a hack, docvalues "picks up > the slack". If its becoming a major issue, address this with points > directly? > > > > > And finally... it really should be multi-valued. The points index > > supports multiple points-per-field within a single document. Seems > > like a big gap that we wouldn't support that with a doc value field. > > Because BDV is inherently single-valued, I propose we come up with an > > encoding scheme that encodes multiple points on top of that "single" > > BDV entry. This is where building on BDV started to feel a little icky > > to me and it seemed like it might be a good use-case for actually > > formalizing a format/encoding, but again, no strong preference. We > > could certainly do something more quickly on top of BDV and formalize > > an encoding later if/as necessary. > > Doesn't matter that points index supports it. Do the use-cases make > sense? It's especially stupid that e.g. LatLonDocValueField supports > multi-values. Really? What kind of quantum documents are in multiple > locations at the same time? > > The sortedset/sortednumeric exist to support use-cases on String and > int, where user wants to "sort on a multivalued field", which is > really crazy if you think about it. So they both sort the numbers at > index-time, so that you can pick a "representative" value > (min/max/median) in constant time. I think a lot of this existing > stuff is just brain-damage from the no-sql fads, alternatively we > could remove this multivalued nonsense and the crazy servers that want > to follow no-sql fads could index just the "representative value" > (min/max/median) in a single-valued field. > > Sorry, I'm just not seeing a lot of strong use-cases here to justify > creating a new DV field, which we should really avoid, as its a hugely > expensive cost. I would recommend prototyping stuff with > BinaryDocValues, using the sandbox, etc. See if the features get > popular and people use them. > > If they really "catch on", and we think its more efficient, then we > can think about how the stuff could be best encoded/compressed/etc. > But adding a new type should be the last resort. Adding some > specialized multi-dimensional type is IMO out of the question. It > would be a lot less horrible to just use separate DV fields, one for > each dimension. If there is *strong* compelling use-cases for > multi-valued stuff, then in the worst case we could think about > something like a UnsortedNumericDV, which would allow fieldA[0] to > align with fieldB[0] and fieldA[1] to align with fieldB[1], which > would solve the issue for faceting. Just don't allow sorting. And > probably not any "slow" query stuff too. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >