Re: SPAM-HIGH: Disparity between API usage and Luke

Rob Cecil Wed, 27 Jun 2012 09:19:51 -0700

And the prize goes to Simon for figuring out the quandary about why Luke
behaved differently. Indeed Luke seems to default its QP to have
SetLowercaseExpandedTerms set to false also. Check out this screenshot:


http://screencast.com/t/zb2jNT3wAM

Notice the checkbox "Lowercase expanded terms..." is unchecked.

On Wed, Jun 27, 2012 at 10:05 AM, Rob Cecil <[email protected]> wrote:

> Thanks Simon that works - even with StandardAnalyzer! :)
>
>
> On Tue, Jun 26, 2012 at 11:44 PM, Simon Svensson <[email protected]> wrote:
>
>> Set queryParser.**SetLowercaseExpandedTerms(**false);
>>
>>
>> On 2012-06-27 03:55, Rob Cecil wrote:
>>
>>> Sure, this is self-contained:
>>>
>>> [Test]
>>>         public void QueryNonAnalyzedField()
>>>         {
>>>             var indexPath = Path.Combine(Environment.**CurrentDirectory,
>>> "testindex");
>>>             var directory = FSDirectory.Open(new
>>> DirectoryInfo(indexPath));
>>>             var analyzer = new KeywordAnalyzer();
>>>             var writer = new IndexWriter(directory, analyzer, true,
>>> IndexWriter.MaxFieldLength.**LIMITED);
>>>             var document = new Document();
>>>             document.Add(new Field("Id", "BAUERREVENUE",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERLOCATION",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERPRODUCT",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERPRODUCTLINE",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERSTATE",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "BAUERTOTAL",
>>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>>             document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
>>> Field.Index.NOT_ANALYZED));
>>>             writer.AddDocument(document);
>>>             writer.Optimize();
>>>             writer.Close();
>>>
>>>             IndexReader reader = IndexReader.Open(directory, true);
>>>             var queryParser = new QueryParser(Version.LUCENE_29,
>>> "content", analyzer);
>>>             var query = queryParser.Parse("Id:BAUER*")**;
>>>             var indexSearch = new IndexSearcher(reader);
>>>             var hits = indexSearch.Search(query);
>>>             Assert.AreEqual(6, hits.Length());
>>>         }
>>>
>>>
>>> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
>>> chandramohan.j.lingam@intel.**com <[email protected]>>
>>> wrote:
>>>
>>>  Just did a simple test and Keywordanalyzer does indeed work like a
>>>> prefix
>>>> query if you put a star at the end. Agree with Simon.  Most likely luke
>>>> was
>>>> using keyword analyzer and somehow UI was not reflecting it?
>>>>
>>>> Please post a small snippet of your index code and query code...
>>>>
>>>> -----Original Message-----
>>>> From: Rob Cecil [mailto:[email protected]]
>>>> Sent: Tuesday, June 26, 2012 5:25 PM
>>>> To: [email protected].**org<[email protected]>
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>
>>>> Thanks, and there is no equivalent QueryParser syntax for that?
>>>>
>>>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.**com <[email protected]>>
>>>> wrote:
>>>>
>>>>  actually, that makes sense. Keyword analyzer would try for an exact
>>>>>
>>>> match.
>>>>
>>>>>  Since you are looking for prefix based search, your best option is to
>>>>> simply use PrefixQuery and there is no need to put a "*" for
>>>>> prefixquery.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Rob Cecil [mailto:[email protected]]
>>>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>>>> To: [email protected].**org<[email protected]>
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>
>>>>> That is correct. I've verified in Luke 1.0.1 that both analyzers
>>>>> produce the same results.
>>>>>
>>>>> To make it interesting, back in my code, I switched over to using the
>>>>> KeywordAnalyzer, and I'm still not getting any results against that
>>>>> NOT_ANALYZED field.
>>>>>
>>>>> ?
>>>>>
>>>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
>>>>> chandramohan.j.lingam@intel.**com <[email protected]>>
>>>>> wrote:
>>>>>
>>>>>  Luke using keyword analyzer as default makes sense. However, in the
>>>>>> original post, there was a link to luke output screenshot which
>>>>>> showed that standard analyzer was in use for query parsing.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Simon Svensson [mailto:[email protected]]
>>>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>>>> To: 
>>>>>> [email protected].**org<[email protected]>
>>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>>
>>>>>> Luke defaults to KeywordAnalyzer which wont change your term in any
>>>>>>
>>>>> way.
>>>>
>>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>>>> would become (Name:Jack DefaultField:Bauer). I believe you can have
>>>>>> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
>>>>>> everything else) using a PerFieldAnalyzerWrapper.
>>>>>>
>>>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>>>
>>>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>>>>>
>>>>>> scenario, I don't believe you would be able to use Query Parser with
>>>>>> standard analyzer when data was originally indexed with
>>>>>> Field.Index.NOT_ANALYZED option.
>>>>>>
>>>>>>> Interesting question is why is luke working/finding the match?  I
>>>>>>> would
>>>>>>>
>>>>>> have expected Luke to not find any matches.
>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Rob Cecil [mailto:[email protected]]
>>>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>>>> To: 
>>>>>>> [email protected].**org<[email protected]>
>>>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>>>
>>>>>>> I can definitely try that. I just expected QueryParser would
>>>>>>> respect the
>>>>>>>
>>>>>> case of the source string. I was hoping to avoid using the Query API
>>>>>> per-se, and just let the parser to the work for me.
>>>>>>
>>>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>>>>>
>>>>>> chandramohan.j.lingam@intel.**com <[email protected]>>
>>>>>> wrote:
>>>>>>
>>>>>>>  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>>
>>>>>>>>> In your code, most likely, the value got converted to lower case
>>>>>>>>
>>>>>>> (i.e.
>>>>
>>>>>  bauer*) by the parse statement.
>>>>>>>> Whereas indexed value is in upper case as it is not analyzed
>>>>>>>> (from screen shot).
>>>>>>>>
>>>>>>>> Can you explicitly try using prefix query?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Same results, apparently, when I use Luke 1.0.1.
>>>>>>>>>
>>>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>>>>>>> custom app, zero.
>>>>>>>>>
>>>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
>>>>>>>>> <[email protected]>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> You appear to be using Luke 3.5 which per the information on
>>>>>>>>>> the Luke homepage 
>>>>>>>>>> (http://code.google.com/p/**luke/<http://code.google.com/p/luke/>)
>>>>>>>>>> uses Lucene
>>>>>>>>>> 3.5
>>>>>>>>>>
>>>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
>>>>>>>>>> to see different behavior between the API and executing in Luke.
>>>>>>>>>>
>>>>>>>>>> If you use a version of Luke which more closely aligns with the
>>>>>>>>>> version
>>>>>>>>>>
>>>>>>>>> of
>>>>>>>>>
>>>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be close
>>>>>>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>>>>>>> releases as I understood it) what behavior do you see?
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Rob
>>>>>>>>>>
>>>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>  If I run a query against my index using QueryParser to query a
>>>>>>>>>>>
>>>>>>>>>> field:
>>>>>
>>>>>>                  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>>>                 var topDocs = searcher.Search(query, 10);
>>>>>>>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>>>>
>>>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase
>>>>>>>>>>> yields
>>>>>>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>>>>>>> both to create the index and to query.
>>>>>>>>>>>
>>>>>>>>>>> The field is defined as:
>>>>>>>>>>>
>>>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>>>>
>>>>>>>>>>> and is a string field. The result set back from Luke looks
>>>>>>>>>>> like
>>>>>>>>>>> (screencap):
>>>>>>>>>>>
>>>>>>>>>>> http://screencast.com/t/**NooMK2Rf<http://screencast.com/t/NooMK2Rf>
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>>
>>
>>
>

Re: SPAM-HIGH: Disparity between API usage and Luke

Reply via email to