Michael,

        Thanks for the response.  It's somewhat of a moot point now, as the
error in the test was something else entirely, but it seems that Lucene.NET
(as it stands now), when doing filtering on date fields, if using the
inclusive date, will internally generate the value up to the last
millisecond and then use that as the upper bound (after conversion).

        After the split from Java, this should probably be addressed in that
date ranges with inclusive dates should internally translate the date to the
next day, and then make the second part of the range query exclusive, in
order to get a better semantic approach.

        Getting the last millisecond of the day just reeks. =)

                - Nick

-----Original Message-----
From: Michael Garski [mailto:mgar...@myspace-inc.com] 
Sent: Wednesday, November 18, 2009 12:56 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: TestDateRange and TestLegacyDateRange - Do they pass in Java,
if so, how?

Great points Nick.  I wanted  to think about this overnight and take a
fresh look at it in the morning, and here's my take on it.

 

TestLegacyDateRange tests the now-deprecated way of expressing date &
time information in documents.  The DateField type goes away in 3.0 in
favor of DateTools

 

The inclusive case of "default:[1/1/2002 TO 1/4/2002]" tests to verify
that all of the dates fall within the range inclusively.  The time of
1/4/2002 23:59:59.999 is truncated to just the date portion of 1/4/2002.

 

The logical comparison of

 

1/1/2002 <= datevalue AND datevalue < 1/5/2002

 

Would be the exclusive range case equivalent to "default:[1/1/2002 TO
1/5/2002}", inclusive on the lower bound and exclusive on the upper
bound with both cases being supported and the choice of approach left to
the user of Lucene.  

 

Legacy date ranges are compared on the string value of the date
(lexicographically),  so if you had a query of

 

mod_date:[20020101 TO 20030101]

 

and a document with a mod_date field value of 20020101120000000, which
technically falls into the range, it would never surface due to the
differing date resolution.

 

Michael

 

From: Nicholas Paldino [.NET/C# MVP] [mailto:casper...@caspershouse.com]

Sent: Tuesday, November 17, 2009 7:14 PM
To: lucene-net-dev@incubator.apache.org
Subject: TestDateRange and TestLegacyDateRange - Do they pass in Java,
if so, how?

 

                After applying the patch that I submitted for
LUCENENET-277, most of the tests under TestQueryParser run and pass.

 

                Two notable standouts are TestDateRange and
TestLegacyDateRange.

 

                If you apply the patches from LUCENENET-278, then the
tests still do not pass, but they do not pass because the text
representations of the queries don't match, not because the queries
can't be created (LUCENENET-278 addresses the issue of not being able to
create a range query from a date, which is the first step to getting
these tests to pass).

 

                The question is, there doesn't seem to be a conversion
issue, more of a bad test case.  The test cases compare the inclusive
and exclusive form of the range query (using "{}" and "[]") and using
dates.  However, in order to test for the exclusive case (curly
brackets).

 

                For example, in the TestDateRange test, the following
query is generated for the inclusive case (en-US date format):

 

default:[1/1/2002 TO 1/4/2002]

 

                But in generating the result to compare against, it uses
a time of 1/4/2002 23:59:59.999.

 

                I find this to be wrong.

 

                First, for an inclusive range of dates, the logical
comparison should be:

 

1/1/2002 <= datevalue AND datevalue < 1/5/2002

 

                Note that the second comparison is the NEXT day, along
with a less than comparison.  Since you can approach the next day in
infinitely decreasing increments, but never actually get to the next
day, this comparison is future-proof in all cases, no matter what the
resolution is when it comes to the measurement of time.  Using one
millisecond before midnight of the next day is an error in the test.

 

                This is possibly an issue in Lucene itself.

 

                Basically, the question is, for inclusive ranges
involving dates, is this test case correct?  I would say no, since this
is what the documentation at
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Range
Searches states (emphasis mine):

 

Range Searches

Range Queries allow one to match documents whose field(s) values are
between the lower and upper bound specified by the Range Query. Range
Queries can be inclusive or exclusive of the upper and lower bounds.
Sorting is done lexicographically.

mod_date:[20020101 TO 20030101]

 

                If the sorting is done lexicographically, then in the
example above, wouldn't any value that has a time component greater than
midnight for 1/1/2003 not fall within this case?  In other words, if you
had a value of 20030101120000000 (noon on 1/1/2003), then that will not
be included, since lexicographically, it comes after 20030101.

 

                That being said, why is 1/4/2002 23:59:59.999 being used
as a test case in this case for inclusive values?  Shouldn't it just be
1/4/2002 (converted into an encoded value of course) and let the bracket
format decide the rest?

 

                                - Nick

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to