Could I ask you to re-post this on the java user's list? This list is for *internal* Lucene development discussion.
Thanks Erick On Fri, Jan 15, 2010 at 8:28 AM, Altimatic <chris.stuckl...@gmail.com>wrote: > > Hi All, > > I have an application that has to count the frequency that a specific > regular expression is matched on a particular field for each document in an > indexed directory. > > For example. > > Lets say I have 10 documents in the directory and each document has 3 > fields, "table", "column" and "data". > > Example Doc(s): > //*************************************************************** > Document doc1 = new Document(); > doc1.add(new Field("table", "EMPLOYEE_US", Field.Store.NO, > Field.Index.ANALYZED); > doc11.add(new Field("column", "F_NAME", Field.Store.NO, > Field.Index.ANALYZED); > doc.add(new Field("data", "Chris Hank Tony Cody Tom Tina Crystal", > Field.Store.NO, Field.Index.ANALYZED, > Field.TermVector.WITH_POSITIONS_OFFSETS); > > Document doc2 = new Document(); > doc2.add(new Field("table", "EMPLOYEE_CA", Field.Store.NO, > Field.Index.ANALYZED); > doc2.add(new Field("column", "F_NAME", Field.Store.NO, > Field.Index.ANALYZED); > doc2.add(new Field("data", "Bob Billy Tom Toby Charles Krista Madonna", > Field.Store.NO, Field.Index.ANALYZED, > Field.TermVector.WITH_POSITIONS_OFFSETS); > > //I know I can create a query to search for a regular expression and that > will return each > //document that contains a match. > > IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), > true, > > IndexWriter.MaxFieldLength.LIMITED); > writer.addDocument(doc); > writer.optimize(); > writer.close(); > searcher = new IndexSearcher(directory); > > RegexQuery query = new RegexQuery( newTerm("data", "^T.*)); > ScoreDoc[] hits = searcher.search(query, null, > maxNumOfHits).scoreDocs;//grab the score docs and go through them to find > the documents that contain a match > > //***************************************************** > > > The code above will tell me that both doc1 and doc2 contain a match for the > constructed query. > > However I need to know how many times the regular expression was matched in > each document. ie. > > doc1 = 3 > doc2 = 2 > > I hope I am being clear...and thanks in advance. > > > Cheers > > -- > View this message in context: > http://old.nabble.com/Finding-frequency-of-regex-query-match-in-a-field-tp27175040p27175040.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >