> Brian, here is another idea for the query parser. To add the ability to mark
> terms as 'non analyzed'.
>
> For example
>
> +body:xyz +folder:a.b.c.d
>
> when 'folder' is a non tokenized field will not match if a.b.c.d is
> tokenized.
>
> A possible syntax may be
>
> +body:xyz +folder:'a.b.c.d'
I understand the desire for such a feature (someone else suggested the
same thing.) I am very wary of creating "new syntax" about which
you'll have to educate your users. I know it sounds like you're only
asking for one feature, but if you think it'll be the last "special
case" that someone wants, well, I don't believe you. I can't think of
any syntax that will clearly and unambiguously indicate "no
tokenization please."
Its one thing to add a syntax for the boost stuff, which only very
advanced users will use, but this is something that might be expected
of relatively beginning users -- "you have to put the author's name in
single quotes, but the article title in double quotes." No way.
I think the request for this underscores an issue that's been bugging
me for a while -- since its so important that you use the same
analyzer for queries as for indexing, maybe the analyzer should
actually be stored in the index store.
I could see two ways to address this issue:
1 (complicated way): When the index store is created, register an
analyzer for each field (could be the same one.) A serialized copy of
the analyzer is stored in the index base, and queries on that field
are automatically processed with it.
2 (simpler, less complete way): Have a way of telling the query parser
that "these fields use these analyzers", or at the very least, "these
fields don't get tokenized with an analyzer."
> BTW, it will be great if the syntax of the query parser will allow
> to describe any query that is supported by Lucene standard
> classes. This will provide a common language to describe queries and
> will provide an alternative, and more intuitive, way to construct
> queries.
Nice goal, and I'm happy to try for it if practical, but I think a
more important rule is that they syntax should be simple and hard to
mess up. I would -1 adding any syntax which will only be used by 5%
of the users, but which might confuse the other 95%, and the same with
any syntax which will be widely used but which requires more than a
sentence or two of explanation to the "average user." Remember, the
people who create these queries are used to using Google; we should
support a query language which is familiar (or at least easily
explained to those users. Advanced users can still create their own
with the query classes.
_______________________________________________
Lucene-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-dev