Hi All,
I have an application that has to count the frequency that a specific
regular expression is matched on a particular field for each document in an
indexed directory.
For example.
Lets say I have 10 documents in the directory and each document has 3
fields, "table", "column" and "data".
Example Doc(s):
//***************************************************************
Document doc1 = new Document();
doc1.add(new Field("table", "EMPLOYEE_US", Field.Store.NO,
Field.Index.ANALYZED);
doc11.add(new Field("column", "F_NAME", Field.Store.NO,
Field.Index.ANALYZED);
doc.add(new Field("data", "Chris Hank Tony Cody Tom Tina Crystal",
Field.Store.NO, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS);
Document doc2 = new Document();
doc2.add(new Field("table", "EMPLOYEE_CA", Field.Store.NO,
Field.Index.ANALYZED);
doc2.add(new Field("column", "F_NAME", Field.Store.NO,
Field.Index.ANALYZED);
doc2.add(new Field("data", "Bob Billy Tom Toby Charles Krista Madonna",
Field.Store.NO, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS);
//I know I can create a query to search for a regular expression and that
will return each
//document that contains a match.
IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(),
true,
IndexWriter.MaxFieldLength.LIMITED);
writer.addDocument(doc);
writer.optimize();
writer.close();
searcher = new IndexSearcher(directory);
RegexQuery query = new RegexQuery( newTerm("data", "^T.*));
ScoreDoc[] hits = searcher.search(query, null,
maxNumOfHits).scoreDocs;//grab the score docs and go through them to find
the documents that contain a match
//*****************************************************
The code above will tell me that both doc1 and doc2 contain a match for the
constructed query.
However I need to know how many times the regular expression was matched in
each document. ie.
doc1 = 3
doc2 = 2
I hope I am being clear...and thanks in advance.
Cheers
--
View this message in context:
http://old.nabble.com/Finding-frequency-of-regex-query-match-in-a-field-tp27175040p27175040.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]