[jira] [Commented] (LUCENE-373) Query parts ending with a colon are handled badly

Harish Kayarohanam (JIRA) Mon, 27 Jul 2015 21:46:12 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643840#comment-14643840
 ]


Harish Kayarohanam commented on LUCENE-373:
-------------------------------------------

My understanding of the above issue and analyze if it really needs a fix ..

section 1:
==========
>>> If queryString is "title: search", there's no exception. However, the parsed
>>> query which is returned is "title:search". 
This is as expected.

section 2:
==========
>>> If queryString is "title: contents: text", 
>>> the parsed query is "title:contents" and the "text" part is ignored 
>>> completely. 
this needs revisit. may be we should bring in something like
a = b = 2 in java or python or javascript or ruby means 2 is assigned to a and 
b .
so similar approach can be followed here .This is discussed in detail later in 
my answer(see section 5 & 7)

section 3:
==========
>>> When queryString is "title: text contents:" the above exception is
>>> produced again.
This is also expected . It breaks the syntax.
Why ? and Why this may not be conceived as a bug ?
We should accept one thing that is that lucene query language is like a 
language of
its own and it has its own syntax. So we should obey that . 
And I would say that it has a meaningful syntax. It is not weird.
why did I make the above statement ?
Let us see what happens in other  programming languages(say python or java or 
javascript or ruby) .
say a = ; ( a = 
is an error (unexpected End of input error)
similary 
 = 2;
is an error ... so
this is something that is common in all most languages and expected ..
why is this the most expected ?
the idea is 
1) if you assign something to nothing it is a bug. = 2
2) if you assign nothing to something it is a bug. a = 
 
Now lets comes to lucene context :
 = something ...
then comes the question "what should we search something against default field 
of something else?" this is meaningless . so it is  best choice made by lucene 
developers to have considered it as a bug and throw parseException.
something = 
what should we search for in field something ... we should not infer anything 
as value unless told explicitly , so here too it is  best choice made by lucene 
developers to have considered it as a bug and throw parseException. I 
personally like the decision made.

section 4:
==========
>>> This seems inconsistent. Given that it's pointless searching for an empty
>>> string (since it has no tokens), I'd expect both "search title:" & "title:
>>> search" to be parsed as "search" (or, given the default field I specified,
>>> "contents:search"), 
search title:  
is like as explained above . I like the present syntax as it is best for a 
syntax not to assume anything unless
said explicitly. like the cases
 = 2 
a = 
where we cannot assume either the field or the term. so it should be a 
parseException and that is what we get now.

"title: search" overrides the default field and searches in title field. this 
is as per design and this cannot do just "search" on default, which breaks the 
original design. pls refer  fields section in 
http://lucene.apache.org/core/5_2_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description.
 

section 5:
===========
>>> and "title: contents: text" 
this seems meaningful at least to me. But I would not say it is right or wrong 
.. but it is about what we want
and what most people want and what seems meaningful.
if we want we can bring in a syntax again I would like to see other programing 
languages to see how a similar syntax is handled 
in java, python, javascript, ruby
a = b = c = 2
is allowed 
and what it does is assign a, b , c  the value of 2 .
so here too we can have syntax to make the text term be searched in fields 
title and contents . This is a choice which 
we can make if the present state is confusing.
I feel that as the person who reported this issue says , just ignoring 
something that user gave silently seems
unfair .This is just my point of view .
If the community takes a stand that this breaks syntax and we don't want this 
new syntax, at least we should throw exception .

section 6:
==========
>>> "title: text contents:" to
>>> parse as "text" ("contents:text") i.e. parts which have no term are 
>>> ignored. At
>>> worst I'd expect them all to throw a ParseException rather than just the 
>>> ones
>>> with the colon at the end of the string.
pls see my explanation above . this as per my reasoning need not be considered 
a bug.

Note: I am taking other programming language syntax  just to see which design 
has stood the test of time .. so that I can infer that it is mostly expected 
from people and is less confusing. These programming languages have evolved 
over time, so we can take these 
syntax as reference and be considered as the most expected ones. I personally 
would like to go by the most famous
expectations. Please correct me if I am wrong.



section 7:
==========
Further discussion on section 5 :
lets see if the new syntax work in our lucene query language, and how it can 
work without ambiguity
a : b : hello world h: when
hello will be searched in fields with names a,b
world will be searched in default field
when will be searched in field with name h.

whenever and wherever there are statements like the following
1) with fieldnames but no terms --   a:
2) terms with intention to assign (with :) but no field name --  : hello
 should be flagged as error.
(already the above is done by query parser..(this is to say that queryparser 
does not just look for : in begining or end and flags the
error. This is good. even if I have statements within brackets like 
(fieldname:) or (:termvalue) it flags error.  

The above in section 5 & 7 is just a proposal. Please give your comments. Feel 
free to point out mistakes.
If there is  expectation that this syntax will have a bad impact on performance 
, even then this syntax need not get inside.

I referred 
http://lucene.apache.org/core/5_2_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description
 for better understanding .

> Query parts ending with a colon are handled badly
> -------------------------------------------------
>
>                 Key: LUCENE-373
>                 URL: https://issues.apache.org/jira/browse/LUCENE-373
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 1.4
>         Environment: Operating System: Windows 2000
> Platform: PC
>            Reporter: Andrew Stevens
>            Priority: Minor
>              Labels: newdev
>
> I'm using Lucene 1.4.3, running
> Query query = QueryParser.parse(queryString, "contents", new 
> StandardAnalyzer());
> If queryString is "search title:" i.e. specifying a field name without a
> corresponding value, I get a parsing exception:
> Encountered "<EOF>" at line 1, column 8.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
> If queryString is "title: search", there's no exception.  However, the parsed
> query which is returned is "title:search".  If queryString is "title: 
> contents:
> text", the parsed query is "title:contents" and the "text" part is ignored
> completely.  When queryString is "title: text contents:" the above exception 
> is
> produced again.
> This seems inconsistent.  Given that it's pointless searching for an empty
> string (since it has no tokens), I'd expect both "search title:" & "title:
> search" to be parsed as "search" (or, given the default field I specified,
> "contents:search"), and "title: contents: text" & "title: text contents:" to
> parse as "text" ("contents:text") i.e. parts which have no term are ignored.  
> At
> worst I'd expect them all to throw a ParseException rather than just the ones
> with the colon at the end of the string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-373) Query parts ending with a colon are handled badly

Reply via email to