After applying the patch that I submitted for LUCENENET-277,
most of the tests under TestQueryParser run and pass.

 

                Two notable standouts are TestDateRange and
TestLegacyDateRange.

 

                If you apply the patches from LUCENENET-278, then the tests
still do not pass, but they do not pass because the text representations of
the queries don't match, not because the queries can't be created
(LUCENENET-278 addresses the issue of not being able to create a range query
from a date, which is the first step to getting these tests to pass).

 

                The question is, there doesn't seem to be a conversion
issue, more of a bad test case.  The test cases compare the inclusive and
exclusive form of the range query (using "{}" and "[]") and using dates.
However, in order to test for the exclusive case (curly brackets).

 

                For example, in the TestDateRange test, the following query
is generated for the inclusive case (en-US date format):

 

default:[1/1/2002 TO 1/4/2002]

 

                But in generating the result to compare against, it uses a
time of 1/4/2002 23:59:59.999.

 

                I find this to be wrong.

 

                First, for an inclusive range of dates, the logical
comparison should be:

 

1/1/2002 <= datevalue AND datevalue < 1/5/2002

 

                Note that the second comparison is the NEXT day, along with
a less than comparison.  Since you can approach the next day in infinitely
decreasing increments, but never actually get to the next day, this
comparison is future-proof in all cases, no matter what the resolution is
when it comes to the measurement of time.  Using one millisecond before
midnight of the next day is an error in the test.

 

                This is possibly an issue in Lucene itself.

 

                Basically, the question is, for inclusive ranges involving
dates, is this test case correct?  I would say no, since this is what the
documentation at
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Range Searches
states (emphasis mine):

 

Range Searches

Range Queries allow one to match documents whose field(s) values are between
the lower and upper bound specified by the Range Query. Range Queries can be
inclusive or exclusive of the upper and lower bounds. Sorting is done
lexicographically.

mod_date:[20020101 TO 20030101]

 

                If the sorting is done lexicographically, then in the
example above, wouldn't any value that has a time component greater than
midnight for 1/1/2003 not fall within this case?  In other words, if you had
a value of 20030101120000000 (noon on 1/1/2003), then that will not be
included, since lexicographically, it comes after 20030101.

 

                That being said, why is 1/4/2002 23:59:59.999 being used as
a test case in this case for inclusive values?  Shouldn't it just be
1/4/2002 (converted into an encoded value of course) and let the bracket
format decide the rest?

 

                                - Nick

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to