[ 
https://issues.apache.org/jira/browse/LUCENE-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156917#comment-13156917
 ] 

Uwe Schindler commented on LUCENE-3593:
---------------------------------------

Hi Simon,

the first patch looks exactly like I said in our first ideas-exchange!

There are smaller problems (but solveable) and one optimization... (a trick 
one...): FieldCacheDocIdSet has some special cases which work with the 
implementation here, but are unclean and should violate some assertions - and 
should be fixed...:

- FieldCacheDocIdSet excepts that the match() method throws 
ArrayIndexOutOfBoundsException when the FieldCacheArray is out of bounds. With 
the FixedBitSet behind that implementation of the FieldCache this basically 
works, but should violate some code assertions added by MikeMcCandless (not 
sure why the testcase does not hit this - doesn't it - I assume it does not 
because the trunk bits() on DocIdSet will intercept this as our filter is not 
sparse -> it switches to random access)
- The FieldCacheDocIdSet should maybe made un-private and refactored out of the 
FieldCacheRangeFilter.
- The positive case could be optimized: A instanceof check in the getDocIdSet() 
method could check for the positive case that the FieldCacheImpl itsself 
returns a FixedBitSet/DocIdSet already and return this directly:

{code:java}
final Bits docsWithField = FieldCache.DEFAULT.getDocsWithField(context.reader, 
field);
if (negate && docsWithField instanceof DocIdSet) // this is always the case for 
our current impl - but who knows :-)
  return (DocIdSet) docsWithField;
{code}

In general the other cases can be easily done by the default stupid (stupid in 
the case that its slowly iterating by doc++ and in trunk directly uses the 
Bits) impl like you did, but once factoring out the 
FieldCacheRangeFilter.FieldCacheDocIdSet we could optimize this and maybe have 
a better negation.

In all cases I dont like double negation of this Filter.

I'll work on the problems and make this filter work better. Should I take this 
issue and solve the problems first? I also want to backport the 
FieldCacheTermsFilter code-duplication removal in trunk to 3.x, so some cleanup 
is really needed!

I will come with a patch adressing those problems later or tomorrow.
                
> Add a filter returning all document without a value in a field
> --------------------------------------------------------------
>
>                 Key: LUCENE-3593
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3593
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3593.patch
>
>
> In some situations it would be useful to have a Filter that simply returns 
> all document that either have at least one or no value in a certain field. We 
> don't have something like that out of the box and adding it seems straight 
> forward.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to