Hi All, I have an application that has to count the frequency that a specific regular expression is matched on a particular field for each document in an indexed directory.
For example. Lets say I have 10 documents in the directory and each document has 3 fields, "table", "column" and "data". Example Doc(s): //*************************************************************** Document doc1 = new Document(); doc1.add(new Field("table", "EMPLOYEE_US", Field.Store.NO, Field.Index.ANALYZED); doc11.add(new Field("column", "F_NAME", Field.Store.NO, Field.Index.ANALYZED); doc.add(new Field("data", "Chris Hank Tony Cody Tom Tina Crystal", Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); Document doc2 = new Document(); doc2.add(new Field("table", "EMPLOYEE_CA", Field.Store.NO, Field.Index.ANALYZED); doc2.add(new Field("column", "F_NAME", Field.Store.NO, Field.Index.ANALYZED); doc2.add(new Field("data", "Bob Billy Tom Toby Charles Krista Madonna", Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); //I know I can create a query to search for a regular expression and that will return each //document that contains a match. IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); writer.addDocument(doc); writer.optimize(); writer.close(); searcher = new IndexSearcher(directory); RegexQuery query = new RegexQuery( newTerm("data", "^T.*)); ScoreDoc[] hits = searcher.search(query, null, maxNumOfHits).scoreDocs;//grab the score docs and go through them to find the documents that contain a match //***************************************************** The code above will tell me that both doc1 and doc2 contain a match for the constructed query. However I need to know how many times the regular expression was matched in each document. ie. doc1 = 3 doc2 = 2 I hope I am being clear...and thanks in advance. Cheers -- View this message in context: http://old.nabble.com/Finding-frequency-of-regex-query-match-in-a-field-tp27175040p27175040.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org