Re: Bug? QueryParser may not correctly interpret RangeQuery text

Eric D. Friedman Sun, 02 Jun 2002 21:19:45 -0700

Instead of reinventing the wheel for representing dates, how about
using an existing standard?  ISO 8601 defines a simple lexical
representation for dates, times (with optional millisecond precision),
and timezones that is easy to implement.  This is what's used in the
XML Schema "dateTime" datatype.


A summary of the ISO 8601 notation is available here:
http://www.cl.cam.ac.uk/~mgk25/iso-time.html

The documentation for the XML Schema dateTime datatype is here:
http://www.w3.org/TR/xmlschema-2/#dateTime

I whipped up a JavaCC parser to handle this lexical representation (see
attachment).

Note that for this to be useful in QueryParser, it's going to need its
own lexical state.  This makes sense anyway, since it would be a
mistake to have the query syntax infer magical properties about strings
that appear to be dates.  Better is to have a keyword in the query
syntax that introduces a date value:  something like date(<VALUE>)
would work.  So would to_date(<VALUE>) for those who know SQL. I would
have suggested date:<VALUE> but I think that already means something in
the QueryParser's lexical specification. (I don't actually use
QueryParser because the patches I've submitted previously haven't made
it in yet, and until they do, QP is fatally crippled for my purposes).

Eric

On Sun, 2 Jun 2002, Peter Carlson wrote:

> I like this idea of [GOOP:GOOP] as it gives the most flexibility. However,
> this requires the field to have a known characteristic like a date field,
> number field or text field correct? If you just use the static Field.Date
> this would require adding a new attribute the field class? I like this idea
> but I donšt know the difficulty / backward compatibility issues.
>
> If the extra field attribute is too difficult, then I suggest we use the
> nnnn-nn-nn format method so we can use the pattern to determine the data
> type.
>
> For number fields, should this support only integers, or decimal numbers
> too?
>
> I don't think we should use the : character, because we probably want to
> support time formats in the date format. Something like 03/01/2001 at
> 00:01:00. Maybe something like ">" or "|" or even "->" ?
>
> Also, inclusive vs. exclusive should be accounted for with the [ vs {
> characters.  I think this might already be done, but just wanted to throw it
> out there.
>
> --Peter
>
>
> On 6/2/02 2:13 AM, "Brian Goetz" <[EMAIL PROTECTED]> wrote:
>
> >>> How about:
> >>>
> >>>  DATE = nnnn-nn-nn
> >>>  NUMBER = n*
> >>>  RANGE = [ DATE : DATE ] | [ NUMBER : NUMBER ]
> >>>
> >>> An alternate, less parse-oriented approach would be this:
> >>>   RANGE = [ GOOP : GOOP ]
> >>> where
> >>>   GOOP = any string of letters/numbers not containing : or ].
> >>
> >> I'd go for the first one as it's more explicit.  However, perhaps the
> >> second approach is more extensible?
> >
> > When I first did the query parser, I defined terms by inclusion
> > (stating valid characters) instead of exclusion (excluding non-term
> > characters.)  Turns out I missed quite a few in the first go around,
> > which taught me the lesson (again) that sometimes trying to be too
> > specific is a rats nest.  What about dates like 02-Mai-2002 (not a
> > typo, french for May)?  Letting DateFormat figure it out has some
> > merit.
> >
> >> DateField(Date) and NumberField(int) sounds right, but wouldn't Field
> >> class make more sense?
> >
> > I had in mind static methods of Field, just like Field.Text --
> > Field.Date, Field.Number.   Sorry if that wasn't clear.  This seems
> > an easy addition.
> >
> > --
> > To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> >
> >
>
>
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>

PARSER_BEGIN(ISO8601Parser)

import java.io.*;
import java.util.*;
import java.text.*;

public class ISO8601Parser {

  static DateFormat fmt;

  public static void main(String args[]) throws ParseException {
    String date;

    //date = "1999-05-31T13:20:00Z";
    //date = "1999-05-31T13:20:00-00:01";
    date = "1999-05-31T13:20:00.999-08:00";

    TimeZone utc = TimeZone.getTimeZone("UTC");
    fmt = DateFormat.getDateTimeInstance();
    fmt.setTimeZone(utc);

    ISO8601Parser parser = new ISO8601Parser(new StringReader(date));
    Date d = parser.date();
    System.out.println(fmt.format(d));
  }
}

PARSER_END(ISO8601Parser)

TOKEN :
{
  <#DIGIT: ["0"-"9"]>
| <TWOD: <DIGIT><DIGIT>>         // two digits used for day, month, hours, minutes, 
|seconds
| <MILLIS: <TWOD><DIGIT>>        // millisecond precision is 000 .. 999
| <YEAR: <TWOD><TWOD>(<DIGIT>)*> // at least 4 digits, but possibly more
| <DASH: "-">                    // delimiter for CCYY-MM-DD; doubles as minus sign 
|for signed ints
| <COLON: ":">                   // delimiter for hh:mm:ss
| <DOT: ".">                     // delimiter for ss.mmm (milliseconds)
| <T: "T" >                      // delimiter between date and time
| <Z: "Z" >                      // UTC timezone
| <PLUS: "+">                    // indicates positive offset from UTC
}

/**
 * Input to this production is a series of tokens matching the following 
specification:
 * CCYY-MM-DD -- a date with no time specification<br>
 * CCYY-MM-DDThh:mm:ss -- a timestamp implicitly in the UTC timezone<br>
 * CCYY-MM-DDThh:mm:ssZ -- a timestamp explicitly in the UTC timezone<br>
 * CCYY-MM-DDThh:mm:ss-08:00 -- a timestamp with a negative 8 hour offset from UTC<br>
 * CCYY-MM-DDThh:mm:ss.mmm -- a timestamp with millisecond precision<br>
 * -CCYY-MM-DD -- a date whose year is before the common era (BCE)<br>
 * NNCCYY-MM-DD -- a date whose year is > 9999<br>
 *
 * <p> Note that years greater than 9999 are allowed, but that 0000 is not a valid 
year.
 * Negative numbers are allowed when representing years BCE.
 * </p>
 *
 * <p>Milliseconds are optional in the seconds field.  The timezone indicator is 
optional.
 * </p>
 *
 *@return a java.util.Date instance in the UTC timezone, with millisecond precision.
 */
Date date() :
{
  int CCYY = 0, MM = 0, DD = 0, hh = 0, mm = 0, ss = 0, millis = 0;
  int deltahh = 0, deltamm = 0;
  boolean deltaPlus = true;
  Calendar c = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
}
{
  CCYY = year() <DASH>
  MM = twod() <DASH>
  DD = twod()
  {
    MM--; // months are 0 based
    c.set(c.YEAR, CCYY);
    c.set(c.MONTH, MM);
    c.set(c.DAY_OF_MONTH, DD);
  }
  (
    <T>
    hh = twod() <COLON>
    mm = twod() <COLON>
    ss = twod()
    {
      c.set(c.HOUR_OF_DAY, hh);
      c.set(c.MINUTE, mm);
      c.set(c.SECOND, ss);
    }
    (
      <DOT>
      millis = millis()
      {
        c.set(c.MILLISECOND, millis);
      }
    )?
    (
      <Z> // we're already in UTC, so no adjustment needed
      |
      (
        (
          <PLUS> // somewhere ahead of UTC (east of Greenwich)
          |
          <DASH> // behind UTC (west of Greenwich)
          {
            deltaPlus = false;
          }
        )
        deltahh = twod() <COLON>
        deltamm = twod()
        {
          if (! deltaPlus) {
            deltahh = -deltahh;
            deltamm = -deltamm;
          }
          // millisecond offset
          int offsetFromUTC = ((deltahh * 60) + deltamm) * 60 * 1000;
          c.set(c.ZONE_OFFSET, offsetFromUTC);
        }
      )
    )?
  )?
  {
    return c.getTime();
  }
}

int millis() :
{
  Token t;
}
{
  t = <MILLIS> {
    return Integer.parseInt(t.image);
  }
}

int twod() :
{
  Token t;
}
{
  t = <TWOD> {
    return Integer.parseInt(t.image);
  }
}

int year() :
{
  Token t;
  boolean positive = true;
}
{
  (
    <DASH>
    {
      positive = false;
    }
  )?
  t = <YEAR> {
    int year = Integer.parseInt(t.image);
    if (year == 0) {
      throw new IllegalArgumentException("0000 is not a legal year");
    }
    return positive ? year : -year;
  }
}

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Bug? QueryParser may not correctly interpret RangeQuery text

Reply via email to