[
https://issues.apache.org/jira/browse/LUCENE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655851#action_12655851
]
Tim Sturge commented on LUCENE-1487:
------------------------------------
I'm running a bit behind this week and I'm out most of next week so it may be a
while before I get to this.
One thing I hope will be helpful in the interim is to repost here the java-dev
exchange that lead to me posting this here; I suspect that many people who
watch JIRA don't necessarily read java-dev as well and I hope the postings are
informative.
Here's the exchange:
On 12/10/08 1:13 PM, "Tim Sturge" <[email protected]> wrote:
> Yes (mostly). It turns those terms into an OpenBitSet on the term array.
> Then it does a fastGet() in the next() and skipTo() loops to see if the term
> for that document is in the set.
>
> The issue is that fastGet() is not as fast as the two inequalities in FCRF.
> I didn't directly benchmark FCTF against FCRF because I had a different
> application in mind for FCTF (location boxes). However it wasn't as
> efficient in that case as directly realizing the bit sets. This was mostly
> because in the application I had in mind there were a lot (>100K) of terms
> with relatively low frequency and queries that needed only a few hundred
> terms in the set.
>
> I tried a sorted list of terms and Arrays.binarySearch() but that is way
> slower as is Set<Integer> (no surprise there). I was thinking about a custom
> hash table implementation but I'm not hopeful; it increases cycle cost and
> means
>
> So it is efficient but for a more limited set of cases than FCRF. My gut
> feeling is that FCRF is a better solution for "most" range filters, whereas
> FCTF is a better solution for "some" term set filters (versus creating
> TermsFilter objects on the fly each time) It all depends on how common the
> terms are and how large the sets of terms are. Lots of terms (or a few very
> common terms) it wins. A few less common terms it loses.
>
> I'll open a JIRA issue for it.
>
> Tim
>
> On 12/10/08 12:45 PM, "Michael McCandless" <[email protected]>
> wrote:
>
>>
>> It'd be great to get this into Lucene.
>>
>> Does FieldCacheTermsFilter let you specify a set of arbitrary terms to
>> filter for, like TermsFilter in contrib/queries? And it's space/time
>> efficient once FieldCache is populated?
>>
>> Mike
>>
>> Tim Sturge wrote:
>>
>>> Mike, Mike,
>>>
>>> I have an implementation of FieldCacheTermsFilter (which uses field
>>> cache to
>>> filter for a predefined set of terms) around if either of you are
>>> interested. It is faster than materializing the filter roughly when
>>> the
>>> filter matches more than 1% of the documents.
>>>
>>> So it's not better for a large set of small filters (which you can
>>> materialize on the spot) but it is better for a small set (but more
>>> than 32)
>>> large filters.
>>>
>>> Let me know if you're interested and I'll send it in.
>>>
>>> Tim
>>>
>>> On 12/10/08 3:34 AM, "Michael McCandless"
>>> <[email protected]> wrote:
>>>
>>>>
>>>> In your approach, roughly how many filters do you have cached? It
>>>> seems like it could be quite a few (one for each color, one for each
>>>> type, etc)?
>>>>
>>>> You might be able to modify the new (on Lucene trunk)
>>>> FieldCacheRangeFilter to achieve this same filtering without actually
>>>> having to materialize the full bitset for each.
>>>>
>>>> Mike
>>>>
>>>> Michael Stoppelman wrote:
>>>>
>>>>> Yeah looks similar to what we've implemented for ourselves
>>>>> (although I
>>>>> haven't looked at the implementation). We've got quite a custom
>>>>> version of
>>>>> lucene at this point. Using Solr at this point really isn't a viable
>>>>> option,
>>>>> but thanks for pointing this out.
>>>>>
>>>>> M
>>>>>
>>>>> On Tue, Dec 9, 2008 at 1:47 AM, Michael McCandless <
>>>>> [email protected]> wrote:
>>>>>
>>>>>>
>>>>>> This use case sounds alot like faceted navigation, which Solr
>>>>>> provides.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>> Michael Stoppelman wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm working on upgrading to Lucene 2.4.0 from 2.3.2 and was trying
>>>>>>> to
>>>>>>> integrate the new DodIdSet changes since
>>>>>>> o.a.l.search.Filter#bits() method
>>>>>>> is now depreciated. For our app we actually heavily rely on bits
>>>>>>> from the
>>>>>>> Filter to do post-query filtering (I explain why below).
>>>>>>>
>>>>>>> For example, if someone searches for product: "ipod" and then
>>>>>>> filters a
>>>>>>> type: "nano" (e.g. mini/nano/regular) AND color: "red" (e.g.
>>>>>>> red/yellow/blue). In our current model the results are gathered in
>>>>>>> the
>>>>>>> following way:
>>>>>>>
>>>>>>> 1) "ipod" w/o attributes is run and the results are stored in a
>>>>>>> hitcollector
>>>>>>> 2) "ipod" results are now filtered for color="red" AND type="mini"
>>>>>>> using
>>>>>>> the
>>>>>>> lucene Filters
>>>>>>> 3) The filtered results are returned to the user.
>>>>>>>
>>>>>>> The reason that the attributes are filtered post-query is so that
>>>>>>> we can
>>>>>>> return the other types and colors the user can filter by in the
>>>>>>> future.
>>>>>>> Meaning the UI would be able to show "blue", "green", "pink",
>>>>>>> etc... if we
>>>>>>> pre-filtered results by color and type before hand we wouldn't
>>>>>>> know what
>>>>>>> the
>>>>>>> other filter options would be there for a broader result set.
>>>>>>>
>>>>>>> Does anyone else have this use case? I'd imagine other folks are
>>>>>>> probably
>>>>>>> doing similar things to accomplish this.
>>>>>>>
>>>>>>> M
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>>
>>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> FieldCacheTermsFilter
> ---------------------
>
> Key: LUCENE-1487
> URL: https://issues.apache.org/jira/browse/LUCENE-1487
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Affects Versions: 2.4
> Reporter: Tim Sturge
> Fix For: 2.9
>
> Attachments: FieldCacheTermsFilter.java
>
>
> This is a companion to FieldCacheRangeFilter except it operates on a set of
> terms rather than a range. It works best when the set is comparatively large
> or the terms are comparatively common.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]