Re: Getting multi-values to use in filter?

Rob Audenaerde Wed, 23 Apr 2014 08:50:37 -0700

Thanks for all the questions, gives me an opportunity to clarify it :)

I want the user to be able to give a (simple) formula (so I don't know it
on beforehand) and use that formula in the search. The Javascript
expressions are really powerful in this use case, but have the single-value
limitation. Ideally, I would like to make it really flexible by for example
allowing (in-document aggregating) expressions like: max(fieldA) - fieldB >
fieldC.


Currently, using single values, I can handle expressions in the form of
"fieldA - fieldB - fieldC > 0" and evaluate the long-value that I receive
from the FunctionValues and the ValueSource. I also optimize the query by
assuring the field exists and has a value, etc. to the search still fast
enough. This works well, but single value only.

I also looked into the facets Association Fields, as they somewhat look
like the thing that I want. Only in the faceting module, all ordinals and
values are stored in one field, so there is no easy way extract the fields
that are used in the expression.

I like the solution one you suggested, to add all the numeric fields an
encoded byte[] like the facets do, but then on a per-field basis, so that
each numeric field has a BDV field that contains all multiple values for
that field for that document.

Now that I am typing this, I think there is another way. I could use the
faceting module and add a different facet field ($facetFIELDA,
$facetFIELDB) in the FacetsConfig for each field. That way it would be
relatively straightforward to get all the values for a field, as they are
exact all the values for the BDV for that document's facet field. Only
aggregating all facets will be harder, as the TaxonomyFacetSum*Associations
would need to do this for all fields that I need facet counts/sums for.

What do you think?

-Rob


On Wed, Apr 23, 2014 at 5:13 PM, Shai Erera <ser...@gmail.com> wrote:

> A NumericDocValues field can only hold one value. Have you thought about
> encoding the values in a BinaryDocValues field? Or are you talking about
> multiple fields (different names), each has its own single value, and at
> search time you sum the values from a different set of fields?
>
> If it's one field, multiple values, then why do you need to separate the
> values? Is it because you sometimes sum and sometimes e.g. avg? Do you
> always include all values of a document in the formula, but the formula
> changes between searches, or do you sometimes use only a subset of the
> values?
>
> If you always use all values, but change the formula between queries, then
> perhaps you can just encode the pre-computed value under different NDV
> fields? If you only use a handful of functions (and they are known in
> advance), it may not be too heavy on the index, and definitely perform
> better during search.
>
> Otherwise, I believe I'd consider indexing them as a BDV field. For facets,
> we basically need the same multi-valued numeric field, and given that NDV
> is single valued, we went w/ BDV.
>
> If I misunderstood the scenario, I'd appreciate if you clarify it :)
>
> Shai
>
>
> On Wed, Apr 23, 2014 at 5:49 PM, Rob Audenaerde <rob.audenae...@gmail.com
> >wrote:
>
> > Hi Shai, all,
> >
> > I am trying to write that Filter :). But I'm a bit at loss as how to
> > efficiently grab the multi-values. I can access the
> > context.reader().document() that accesses the storedfields, but that
> seems
> > slow.
> >
> > For single-value fields I use a compiled JavaScript Expression with
> > simplebindings as ValueSource, which seems to work quite well. The
> downside
> > is that I cannot find a way to implement multi-value through that
> solution.
> >
> > These create for example a LongFieldSource, which uses the
> > FieldCache.LongParser. These parsers only seem te parse one field.
> >
> > Is there an efficient way to get -all- of the (numeric) values for a
> field
> > in a document?
> >
> >
> > On Wed, Apr 23, 2014 at 4:38 PM, Shai Erera <ser...@gmail.com> wrote:
> >
> > > You can do that by writing a Filter which returns matching documents
> > based
> > > on a sum of the field's value. However I suspect that is going to be
> > slow,
> > > unless you know that you will need several such filters and can cache
> > them.
> > >
> > > Another approach would be to write a Collector which serves as a
> Filter,
> > > but computes the sum only for documents that match the query. Hopefully
> > > that would mean you compute the sum for less documents than you would
> > have
> > > w/ the Filter approach.
> > >
> > > Shai
> > >
> > >
> > > On Wed, Apr 23, 2014 at 5:11 PM, Michael Sokolov <
> > > msoko...@safaribooksonline.com> wrote:
> > >
> > > > This isn't really a good use case for an index like Lucene.  The most
> > > > essential property of an index is that it lets you look up documents
> > very
> > > > quickly based on *precomputed* values.
> > > >
> > > > -Mike
> > > >
> > > >
> > > > On 04/23/2014 06:56 AM, Rob Audenaerde wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I'm looking for a way to use multi-values in a filter.
> > > >>
> > > >> I want to be able to search on  sum(field)=100, where field has
> values
> > > in
> > > >> one documents:
> > > >>
> > > >> field=60
> > > >> field=40
> > > >>
> > > >> In this case 'field' is a LongField. I examined the code in the
> > > >> FieldCache,
> > > >> but that seems to focus on single-valued fields only, or
> > > >>
> > > >>
> > > >> It this something that can be done in Lucene? And what would be a
> good
> > > >> approach?
> > > >>
> > > >> Thanks in advance,
> > > >>
> > > >> -Rob
> > > >>
> > > >>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > >
> > > >
> > >
> >
>

Re: Getting multi-values to use in filter?

Reply via email to