Re: SPAM-HIGH: Disparity between API usage and Luke

Rob Cecil Tue, 26 Jun 2012 20:44:03 -0700

Yeah sorry, I should have created 7 documents in the testindex - in my rush to 
get a standalone test done and emailed out I botched that. Thanks for the 
insight into the case issue with the KeywordAnalyzer. I'm starting to think how 
I might structure my application to possibly use the Query API in conjunction 
with the QueryParser. But, QueryParser is very compelling.


Sent from my iPhone

On Jun 26, 2012, at 9:28 PM, "Lingam, ChandraMohan J" 
<chandramohan.j.lin...@intel.com> wrote:

> Interestingly, the query generated from this var query = 
> queryParser.Parse("Id:BAUER*") is converted to lower case "bauer*" eventhough 
> you are using KeywordAnalyzer.  I am not sure if this is the intended 
> behavior of the keyword analyzer.
> 
> So, best option to make this example work is to index in lowercase:
>            document.Add(new Field("Id", "bauerrevenue", Field.Store.YES, 
> Field.Index.NOT_ANALYZED));
> 
> Also, the assert will always fail because hit count even when it matches will 
> be 1 since there is only one document with several values associated with the 
> field.  You would need to iterate thru the fields.  If you want to match 6 
> documents, then you have to add as six separate documents instead one 
> document will all the values.
> 
> 
> 
> 
> -----Original Message-----
> From: Rob Cecil [mailto:rob.ce...@gmail.com] 
> Sent: Tuesday, June 26, 2012 6:55 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> 
> Sure, this is self-contained:
> 
> [Test]
>        public void QueryNonAnalyzedField()
>        {
>            var indexPath = Path.Combine(Environment.CurrentDirectory,
> "testindex");
>            var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
>            var analyzer = new KeywordAnalyzer();
>            var writer = new IndexWriter(directory, analyzer, true, 
> IndexWriter.MaxFieldLength.LIMITED);
>            var document = new Document();
>            document.Add(new Field("Id", "BAUERREVENUE", Field.Store.YES, 
> Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERLOCATION", Field.Store.YES, 
> Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERPRODUCT", Field.Store.YES, 
> Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERPRODUCTLINE", Field.Store.YES, 
> Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERSTATE", Field.Store.YES, 
> Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "BAUERTOTAL", Field.Store.YES, 
> Field.Index.NOT_ANALYZED));
>            document.Add(new Field("Id", "NOTBAUER", Field.Store.YES, 
> Field.Index.NOT_ANALYZED));
>            writer.AddDocument(document);
>            writer.Optimize();
>            writer.Close();
> 
>            IndexReader reader = IndexReader.Open(directory, true);
>            var queryParser = new QueryParser(Version.LUCENE_29, "content", 
> analyzer);
>            var query = queryParser.Parse("Id:BAUER*");
>            var indexSearch = new IndexSearcher(reader);
>            var hits = indexSearch.Search(query);
>            Assert.AreEqual(6, hits.Length());
>        }
> 
> 
> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J < 
> chandramohan.j.lin...@intel.com> wrote:
> 
>> Just did a simple test and Keywordanalyzer does indeed work like a 
>> prefix query if you put a star at the end. Agree with Simon.  Most 
>> likely luke was using keyword analyzer and somehow UI was not reflecting it?
>> 
>> Please post a small snippet of your index code and query code...
>> 
>> -----Original Message-----
>> From: Rob Cecil [mailto:rob.ce...@gmail.com]
>> Sent: Tuesday, June 26, 2012 5:25 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>> 
>> Thanks, and there is no equivalent QueryParser syntax for that?
>> 
>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J < 
>> chandramohan.j.lin...@intel.com> wrote:
>> 
>>> actually, that makes sense. Keyword analyzer would try for an exact
>> match.
>>> Since you are looking for prefix based search, your best option is 
>>> to simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>>> 
>>> -----Original Message-----
>>> From: Rob Cecil [mailto:rob.ce...@gmail.com]
>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>> To: lucene-net-user@lucene.apache.org
>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>> 
>>> That is correct. I've verified in Luke 1.0.1 that both analyzers 
>>> produce the same results.
>>> 
>>> To make it interesting, back in my code, I switched over to using 
>>> the KeywordAnalyzer, and I'm still not getting any results against 
>>> that NOT_ANALYZED field.
>>> 
>>> ?
>>> 
>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < 
>>> chandramohan.j.lin...@intel.com> wrote:
>>> 
>>>> Luke using keyword analyzer as default makes sense. However, in 
>>>> the original post, there was a link to luke output screenshot 
>>>> which showed that standard analyzer was in use for query parsing.
>>>> 
>>>> -----Original Message-----
>>>> From: Simon Svensson [mailto:si...@devhost.se]
>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>> To: lucene-net-user@lucene.apache.org
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>> 
>>>> Luke defaults to KeywordAnalyzer which wont change your term in 
>>>> any
>> way.
>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>> would become (Name:Jack DefaultField:Bauer). I believe you can 
>>>> have per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer 
>>>> for everything else) using a PerFieldAnalyzerWrapper.
>>>> 
>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>> scenario, I don't believe you would be able to use Query Parser 
>>>> with standard analyzer when data was originally indexed with 
>>>> Field.Index.NOT_ANALYZED option.
>>>>> 
>>>>> Interesting question is why is luke working/finding the match?  
>>>>> I would
>>>> have expected Luke to not find any matches.
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Rob Cecil [mailto:rob.ce...@gmail.com]
>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>> To: lucene-net-user@lucene.apache.org
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>> 
>>>>> I can definitely try that. I just expected QueryParser would 
>>>>> respect the
>>>> case of the source string. I was hoping to avoid using the Query 
>>>> API per-se, and just let the parser to the work for me.
>>>>> 
>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lin...@intel.com> wrote:
>>>>> 
>>>>>>>> var query = _parser.Parse("Id:BAUER*");
>>>>>> In your code, most likely, the value got converted to lower 
>>>>>> case
>> (i.e.
>>>>>> bauer*) by the parse statement.
>>>>>> Whereas indexed value is in upper case as it is not analyzed 
>>>>>> (from screen shot).
>>>>>> 
>>>>>> Can you explicitly try using prefix query?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Same results, apparently, when I use Luke 1.0.1.
>>>>>>> 
>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my 
>>>>>>> custom app, zero.
>>>>>>> 
>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse 
>>>>>>> <rve...@dotnetrdf.org>
>>>>>> wrote:
>>>>>>>> You appear to be using Luke 3.5 which per the information on 
>>>>>>>> the Luke homepage (http://code.google.com/p/luke/) uses 
>>>>>>>> Lucene
>>>>>>>> 3.5
>>>>>>>> 
>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be 
>>>>>>>> surprised to see different behavior between the API and executing in 
>>>>>>>> Luke.
>>>>>>>> 
>>>>>>>> If you use a version of Luke which more closely aligns with 
>>>>>>>> the version
>>>>>>> of
>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be 
>>>>>>>> close enough since the 2.9.x releases were previews of the 
>>>>>>>> 3.0.x releases as I understood it) what behavior do you see?
>>>>>>>> 
>>>>>>>> Hope this helps,
>>>>>>>> 
>>>>>>>> Rob
>>>>>>>> 
>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <rob.ce...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> If I run a query against my index using QueryParser to query 
>>>>>>>>> a
>>> field:
>>>>>>>>> 
>>>>>>>>>                var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>                var topDocs = searcher.Search(query, 10);
>>>>>>>>>                Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>> 
>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query phrase 
>>>>>>>>> yields
>>>>>>>>> 15 results, what am I doing wrong? I use the 
>>>>>>>>> StandardAnalyzer both to create the index and to query.
>>>>>>>>> 
>>>>>>>>> The field is defined as:
>>>>>>>>> 
>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>> 
>>>>>>>>> and is a string field. The result set back from Luke looks 
>>>>>>>>> like
>>>>>>>>> (screencap):
>>>>>>>>> 
>>>>>>>>> http://screencast.com/t/NooMK2Rf
>>>>>>>>> 
>>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>

Re: SPAM-HIGH: Disparity between API usage and Luke

Reply via email to