If you're using StandardAnalyzer what's the reference to CustomAnalyzerForCaseSensitive all about? Someone else with more patience or better diagnostic skill may well spot your problem but I can't.
My final suggestion is that you build and post the smallest possible self-contained program, using RAMDirectory and no external classes. If you are using a custom analyzer try it without - if that works you've got a clue as to where to look next. Good luck. On Wed, Aug 14, 2013 at 3:46 PM, Ankit Murarka <ankit.mura...@rancoretech.com> wrote: > Hello. I gave the complete code sample so that anyone can try and let me > know. This is because this issue is really taking a toll on me. > I am so close yet so far. > > Yes, I am using analyzer to index the document. The Analyzer is > StandardAnalyzer but I have commented the LowerCaseFilter code from that. > > Yes In my trailing mail I have mentioned the same. > > This is what is present in my file: > > INSIDE POST OF Listener\ > > This is what is present in the index: > > INSIDE POST OF Listener\ > > The query which I gave to search: > > > Query is +contents:INSIDE contents:POST > > > STILL I AM GETTING NO HIT.. But If I index all the documents normally > (without indexing them line by line) I do get HITS.. > > Still not able to figure out the problem. > > > > On 8/14/2013 8:07 PM, Ian Lea wrote: >> >> I was rather hoping for something smaller! >> >> One suggestion from a glance is that you're using some analyzer >> somewhere but building a BooleanQuery out of a TermQuery or two. Are >> you sure (test it and prove it) that the strings you pass to the >> TermQuery are EXACTLY what has been indexed? >> >> >> -- >> Ian. >> >> >> On Wed, Aug 14, 2013 at 3:29 PM, Ankit Murarka >> <ankit.mura...@rancoretech.com> wrote: >> >>> >>> Hello. The problem is as follows: >>> >>> I have a document containing information in lines. So I am indexing all >>> files line by line. >>> So If I say in my document I have, >>> INSIDE POST OF SERVER\ >>> and in my index file created I have, >>> INSIDE POST OF SERVER\ >>> >>> and I fire a boolean query with INSIDE and POST with MUST/MUST, I am >>> getting >>> no HIT. >>> >>> I am providing the complete CODE I am using to create INDEX and TO >>> SEARCH..Both are drawn from sample code present online. >>> >>> /*INDEX CODE: >>> */ >>> package org.RunAllQueriesWithLineByLinePhrases; >>> >>> public class CreateIndex { >>> public static void main(String[] args) { >>> String indexPath = "D:\\INDEXFORQUERY"; //Place where indexes will >>> be >>> created >>> String docsPath="Indexed"; //Place where the files are kept. >>> boolean create=true; >>> final File docDir = new File(docsPath); >>> if (!docDir.exists() || !docDir.canRead()) { >>> System.exit(1); >>> } >>> try { >>> Directory dir = FSDirectory.open(new File(indexPath)); >>> Analyzer analyzer=new >>> CustomAnalyzerForCaseSensitive(Version.LUCENE_44); >>> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44, >>> analyzer); >>> if (create) { >>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); >>> } else { >>> System.out.println("Trying to set IWC mode to UPDATE...NOT >>> DESIRED.."); >>> } >>> IndexWriter writer = new IndexWriter(dir, iwc); >>> indexDocs(writer, docDir); >>> writer.close(); >>> } catch (IOException e) { >>> System.out.println(" caught a " + e.getClass() + >>> "\n with message: " + e.getMessage()); >>> } >>> } >>> static void indexDocs(IndexWriter writer, File file) >>> throws IOException { >>> if (file.canRead()) >>> { >>> if (file.isDirectory()) { >>> String[] files = file.list(); >>> if (files != null) { >>> for (int i = 0; i< files.length; i++) { >>> if(files[i]!=null) >>> indexDocs(writer, new File(file, files[i])); >>> } >>> } >>> } else { >>> try { >>> Document doc = new Document(); >>> Field pathField = new StringField("path", file.getPath(), >>> Field.Store.YES); >>> doc.add(pathField); >>> doc.add(new LongField("modified", file.lastModified(), >>> Field.Store.NO)); >>> LineNumberReader lnr=new LineNumberReader(new >>> FileReader(file)); >>> String line=null; >>> while( null != (line = lnr.readLine()) ){ >>> doc.add(new StringField("contents",line,Field.Store.YES)); >>> } >>> if (writer.getConfig().getOpenMode() == OpenMode.CREATE) { >>> writer.addDocument(doc); >>> } else { >>> writer.updateDocument(new Term("path", file.getPath()), >>> doc); >>> } >>> } finally { >>> } >>> } >>> } >>> } } >>> >>> /*SEARCHING CODE:-*/ >>> >>> package org.RunAllQueriesWithLineByLinePhrases; >>> >>> public class SearchFORALLQUERIES { >>> public static void main(String[] args) throws Exception { >>> >>> String[] argument=new String[20]; >>> argument[0]="-index"; >>> argument[1]="D:\\INDEXFORQUERY"; >>> argument[2]="-field"; >>> argument[3]="contents"; //field value >>> argument[4]="-repeat"; >>> argument[5]="2"; //repeat value >>> argument[6]="-raw"; >>> argument[7]="-paging"; >>> argument[8]="300"; //paging value >>> >>> String index = "index"; >>> String field = "contents"; >>> String queries = null; >>> int repeat = 0; >>> boolean raw = false; >>> String queryString = null; >>> int hitsPerPage = 10; >>> >>> for(int i = 0;i< argument.length;i++) { >>> if ("-index".equals(argument[i])) { >>> index = argument[i+1]; >>> i++; >>> } else if ("-field".equals(argument[i])) { >>> field = argument[i+1]; >>> i++; >>> } else if ("-queries".equals(argument[i])) { >>> queries = argument[i+1]; >>> i++; >>> } else if ("-query".equals(argument[i])) { >>> queryString = argument[i+1]; >>> i++; >>> } else if ("-repeat".equals(argument[i])) { >>> repeat = Integer.parseInt(argument[i+1]); >>> i++; >>> } else if ("-raw".equals(argument[i])) { >>> raw = true; //set it true to just display the count. If false >>> then >>> it also display file name. >>> } else if ("-paging".equals(argument[i])) { >>> hitsPerPage = Integer.parseInt(argument[i+1]); >>> if (hitsPerPage<= 0) { >>> System.err.println("There must be at least 1 hit per page."); >>> System.exit(1); >>> } >>> i++; >>> } >>> } >>> System.out.println("processing input"); >>> IndexReader reader = DirectoryReader.open(FSDirectory.open(new >>> File(index))); //location where indexes are. >>> IndexSearcher searcher = new IndexSearcher(reader); >>> BufferedReader in = null; >>> if (queries != null) { >>> in = new BufferedReader(new InputStreamReader(new >>> FileInputStream(queries), "UTF-8")); //provide query as input >>> } else { >>> in = new BufferedReader(new InputStreamReader(System.in, "UTF-8")); >>> } >>> while (true) { >>> if (queries == null&& queryString == null) { >>> // >>> >>> prompt the user >>> System.out.println("Enter query: "); //if query is not present, >>> prompt the user to enter query. >>> } >>> String line = queryString != null ? queryString : in.readLine(); >>> >>> if (line == null || line.length() == -1) { >>> break; >>> } >>> line = line.trim(); >>> if (line.length() == 0) { >>> break; >>> } >>> String[] str=line.split(" "); >>> System.out.println("queries are " + str[0] + " and is " + str[1]); >>> Query query1 = new TermQuery(new Term(field, str[0])); >>> Query query2=new TermQuery(new Term(field,str[1])); >>> BooleanQuery booleanQuery = new BooleanQuery(); >>> booleanQuery.add(query1, BooleanClause.Occur.MUST); >>> booleanQuery.add(query2, BooleanClause.Occur.MUST); >>> if (repeat> 0) { //repeat=2 // repeat& >>> time as benchmark >>> Date start = new Date(); >>> for (int i = 0; i< repeat; i++) { >>> searcher.search(booleanQuery, null, 100); >>> } >>> Date end = new Date(); >>> System.out.println("Time: >>> "+(end.getTime()-start.getTime())+"ms"); >>> } >>> doPagingSearch(in, searcher, booleanQuery, hitsPerPage, raw, >>> queries >>> == null&& queryString == null); >>> >>> if (queryString != null) { >>> break; >>> } >>> } >>> reader.close(); >>> } >>> public static void doPagingSearch(BufferedReader in, IndexSearcher >>> searcher, Query query, >>> int hitsPerPage, boolean raw, >>> boolean >>> interactive) throws IOException { >>> TopDocs results = searcher.search(query, 5 * hitsPerPage); >>> ScoreDoc[] hits = results.scoreDocs; >>> int numTotalHits = results.totalHits; >>> System.out.println(numTotalHits + " total matching documents"); >>> int start = 0; >>> int end = Math.min(numTotalHits, hitsPerPage); >>> while (true) { >>> if (end> hits.length) { >>> System.out.println("Only results 1 - " + hits.length +" of " + >>> numTotalHits + " total matching documents collected."); >>> System.out.println("Collect more (y/n) ?"); >>> String line = in.readLine(); >>> if (line.length() == 0 || line.charAt(0) == 'n') { >>> break; >>> } >>> hits = searcher.search(query, numTotalHits).scoreDocs; >>> } >>> end = Math.min(hits.length, start + hitsPerPage); //3 and 5. >>> for (int i = start; i< end; i++) { //0 to 3. >>> if (raw) { >>> >>> System.out.println("doc="+hits[i].doc+" >>> score="+hits[i].score); >>> } >>> Document doc = searcher.doc(hits[i].doc); >>> List<IndexableField> filed=doc.getFields(); >>> filed.size(); >>> String path = doc.get("path"); >>> if (path != null) { >>> System.out.println((i+1) + ". " + path); >>> String title = doc.get("title"); >>> if (title != null) { >>> System.out.println(" Title: " + doc.get("title")); >>> } >>> } else { >>> System.out.println((i+1) + ". " + "No path for this >>> document"); >>> } >>> } >>> if (!interactive || end == 0) { >>> break; >>> } >>> if (numTotalHits>= end) { >>> boolean quit = false; >>> while (true) { >>> System.out.print("Press "); >>> if (start - hitsPerPage>= 0) { >>> System.out.print("(p)revious page, "); >>> } >>> if (start + hitsPerPage< numTotalHits) { >>> System.out.print("(n)ext page, "); >>> } >>> System.out.println("(q)uit or enter number to jump to a >>> page."); >>> String line = in.readLine(); >>> if (line.length() == 0 || line.charAt(0)=='q') { >>> quit = true; >>> break; >>> } >>> if (line.charAt(0) == 'p') { >>> start = Math.max(0, start - hitsPerPage); >>> break; >>> } else if (line.charAt(0) == 'n') { >>> if (start + hitsPerPage< numTotalHits) { >>> start+=hitsPerPage; >>> } >>> break; >>> } else { >>> int page = Integer.parseInt(line); >>> if ((page - 1) * hitsPerPage< numTotalHits) { >>> start = (page - 1) * hitsPerPage; >>> break; >>> } else { >>> System.out.println("No such page"); >>> } >>> } >>> } >>> if (quit) break; >>> end = Math.min(numTotalHits, start + hitsPerPage); >>> } >>> } >>> } >>> } >>> >>> /*CUSTOM ANALYZER CODE:*/ >>> >>> package com.rancore.demo; >>> >>> import java.io.IOException; >>> import java.io.Reader; >>> >>> import org.apache.lucene.analysis.TokenStream; >>> import org.apache.lucene.analysis.core.StopAnalyzer; >>> import org.apache.lucene.analysis.core.StopFilter; >>> import org.apache.lucene.analysis.standard.StandardFilter; >>> import org.apache.lucene.analysis.standard.StandardTokenizer; >>> import org.apache.lucene.analysis.util.CharArraySet; >>> import org.apache.lucene.analysis.util.StopwordAnalyzerBase; >>> import org.apache.lucene.util.Version; >>> >>> public class CustomAnalyzerForCaseSensitive extends StopwordAnalyzerBase >>> { >>> >>> public static final int DEFAULT_MAX_TOKEN_LENGTH = 255; >>> private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH; >>> public static final CharArraySet STOP_WORDS_SET = >>> StopAnalyzer.ENGLISH_STOP_WORDS_SET; >>> public CustomAnalyzerForCaseSensitive(Version matchVersion, >>> CharArraySet stopWords) { >>> super(matchVersion, stopWords); >>> } >>> public CustomAnalyzerForCaseSensitive(Version matchVersion) { >>> this(matchVersion, STOP_WORDS_SET); >>> } >>> public CustomAnalyzerForCaseSensitive(Version matchVersion, Reader >>> stopwords) throws IOException { >>> this(matchVersion, loadStopwordSet(stopwords, >>> matchVersion)); >>> } >>> public void setMaxTokenLength(int length) { >>> maxTokenLength = length; >>> } >>> /** >>> * @see #setMaxTokenLength >>> */ >>> public int getMaxTokenLength() { >>> return maxTokenLength; >>> } >>> @Override >>> protected TokenStreamComponents createComponents(final String >>> fieldName, >>> final Reader reader) { >>> final StandardTokenizer src = new >>> StandardTokenizer(matchVersion, >>> reader); >>> src.setMaxTokenLength(maxTokenLength); >>> TokenStream tok = new StandardFilter(matchVersion, src); >>> // tok = new LowerCaseFilter(matchVersion, tok); >>> tok = new StopFilter(matchVersion, tok, stopwords); >>> return new TokenStreamComponents(src, tok) { >>> @Override >>> protected void setReader(final Reader reader) throws >>> IOException { >>> >>> >>> src.setMaxTokenLength(CustomAnalyzerForCaseSensitive.this.maxTokenLength); >>> super.setReader(reader); >>> } >>> }; >>> } >>> } >>> >>> >>> >>> I HOPE I HAVE GIVEN THE COMPLETE CODE SAMPLE FOR PEOPLE TO WORK ON.. >>> >>> PLEASE GUIDE ME NOW: IN case any further information is required please >>> let >>> me know. >>> >>> >>> On 8/14/2013 7:43 PM, Ian Lea wrote: >>> >>>> >>>> Well, you have supplied a bit more info - good - but I still can't >>>> spot the problem. Unless someone else can I suggest you post a very >>>> small self-contained program that demonstrates the problem. >>>> >>>> >>>> -- >>>> Ian. >>>> >>>> >>>> On Wed, Aug 14, 2013 at 2:50 PM, Ankit Murarka >>>> <ankit.mura...@rancoretech.com> wrote: >>>> >>>> >>>>> >>>>> Hello. >>>>> The problem does not seem to be getting solved. >>>>> >>>>> As mentioned, I am indexing each line of each file. >>>>> The sample text present inside LUKE is >>>>> >>>>> <am name="notification" value="10"/>\ >>>>> <type="DE">\ >>>>> java.lang.Thread.run(Thread.java:619) >>>>> >>>>> >>>>>>> >>>>>>> Size of list array::0\ >>>>>>> >>>>>>> >>>>> >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> org.com.dummy,INFO,<< Still figuring out how to run >>>>> >>>>> >>>>>>> >>>>>>> ,SERVER,100.100.100.100:8080,EXCEPTION,10613349 >>>>>>> >>>>>>> >>>>> >>>>> INSIDE POST OF Listener\ >>>>> >>>>> In my Luke, I can see the text as "INSIDE POST OF Listener" .. This is >>>>> present in many files. >>>>> >>>>> /*Query is +contents:INSIDE contents:POST */ --/The field >>>>> name >>>>> is contents. Same analyzer is being used. This is a boolean query./ >>>>> >>>>> To test, I indexed only 20 files. In 19 files, this is present. >>>>> >>>>> The boolean query should give me a hit for this document. >>>>> >>>>> BUT IT IS RETURNING ME NO HIT.. >>>>> >>>>> If I index the same files WITHOUT line by line then, it gives me proper >>>>> hits.. >>>>> >>>>> But for me it should work on Indexes created by Line by Line parsing >>>>> also. >>>>> >>>>> Please guide. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 8/13/2013 4:41 PM, Ian Lea wrote: >>>>> >>>>> >>>>>> >>>>>> remedialaction != "remedial action"? >>>>>> >>>>>> Show us your query. Show a small self-contained sample program or >>>>>> test case that demonstrates the problem. You need to give us >>>>>> something more to go on. >>>>>> >>>>>> >>>>>> -- >>>>>> Ian. >>>>>> >>>>>> >>>>>> On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka >>>>>> <ankit.mura...@rancoretech.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Hello, >>>>>>> I am aware of that link and I have been through that link >>>>>>> many >>>>>>> number of times. >>>>>>> >>>>>>> Problem I have is: >>>>>>> >>>>>>> 1. Each line is indexed. So indexed line looks something like >>>>>>> "<attribute >>>>>>> name="remedial action" value="Checking"/>\" >>>>>>> 2. I am easily firing a phrase query on this line. It suggest me the >>>>>>> possible values. No problem,. >>>>>>> 3. If I fire a Boolean Query with "remedialaction" and "Checking" as >>>>>>> a >>>>>>> must/must , then it is not providing me this document as a hit. >>>>>>> 4. I am using StandardAnalyzer both during the indexing and searching >>>>>>> time. >>>>>>> >>>>>>> >>>>>>> On 8/13/2013 2:31 PM, Ian Lea wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Should be straightforward enough. Work through the tips in the FAQ >>>>>>>> entry at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F >>>>>>>> and post back if that doesn't help, with details of how you are >>>>>>>> analyzing the data and how you are searching. >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ian. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 13, 2013 at 8:56 AM, Ankit Murarka >>>>>>>> <ankit.mura...@rancoretech.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Hello All, >>>>>>>>> I have 2 different usecases. >>>>>>>>> I am trying to provide both boolean query and phrase search query >>>>>>>>> in >>>>>>>>> the >>>>>>>>> application. >>>>>>>>> >>>>>>>>> In every line of the document which I am indexing I have content >>>>>>>>> like >>>>>>>>> : >>>>>>>>> >>>>>>>>> <attribute name="remedial action" value="Checking"/>\ >>>>>>>>> >>>>>>>>> Due to the phrase search requirement, I am indexing each line of >>>>>>>>> the >>>>>>>>> file >>>>>>>>> as >>>>>>>>> a new document. >>>>>>>>> >>>>>>>>> Now when I am trying to do a phrase query (Did you Mean, Infix >>>>>>>>> Analyzer >>>>>>>>> etc, >>>>>>>>> or phrase suggest) this seems to work fine and provide me with >>>>>>>>> desired >>>>>>>>> suggestions. >>>>>>>>> >>>>>>>>> Problem is : >>>>>>>>> >>>>>>>>> How do I invoke boolean query for this. I mean when I verified the >>>>>>>>> indexes >>>>>>>>> in Luke, I saw the whole line as expected is indexed. >>>>>>>>> >>>>>>>>> So, if user wish to perform a boolean query say suppose containing >>>>>>>>> "remedialaction" and "Checking" how do I get this document as a >>>>>>>>> hit. >>>>>>>>> I >>>>>>>>> believe since I am indexing each line, this seems to be bit tricky. >>>>>>>>> >>>>>>>>> Please guide. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> Ankit >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards >>>>>>> >>>>>>> Ankit Murarka >>>>>>> >>>>>>> "What lies behind us and what lies before us are tiny matters >>>>>>> compared >>>>>>> with >>>>>>> what lies within us" >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards >>>>> >>>>> Ankit Murarka >>>>> >>>>> "What lies behind us and what lies before us are tiny matters compared >>>>> with >>>>> what lies within us" >>>>> >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Regards >>> >>> Ankit Murarka >>> >>> "What lies behind us and what lies before us are tiny matters compared >>> with >>> what lies within us" >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > > > > -- > Regards > > Ankit Murarka > > "What lies behind us and what lies before us are tiny matters compared with > what lies within us" > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org