Could I ask you to re-post this on the java user's list? This list is for
*internal* Lucene development discussion.

Thanks
Erick

On Fri, Jan 15, 2010 at 8:28 AM, Altimatic <chris.stuckl...@gmail.com>wrote:

>
> Hi All,
>
> I have an application that has to count the frequency that a specific
> regular expression is matched on a particular field for each document in an
> indexed directory.
>
> For example.
>
> Lets say I have 10 documents in the directory and each document has 3
> fields, "table", "column" and "data".
>
> Example Doc(s):
> //***************************************************************
> Document doc1 = new Document();
> doc1.add(new Field("table", "EMPLOYEE_US", Field.Store.NO,
> Field.Index.ANALYZED);
> doc11.add(new Field("column", "F_NAME", Field.Store.NO,
> Field.Index.ANALYZED);
> doc.add(new Field("data", "Chris Hank Tony Cody Tom Tina Crystal",
> Field.Store.NO, Field.Index.ANALYZED,
> Field.TermVector.WITH_POSITIONS_OFFSETS);
>
> Document doc2 = new Document();
> doc2.add(new Field("table", "EMPLOYEE_CA", Field.Store.NO,
> Field.Index.ANALYZED);
> doc2.add(new Field("column", "F_NAME", Field.Store.NO,
> Field.Index.ANALYZED);
> doc2.add(new Field("data", "Bob Billy Tom Toby Charles Krista Madonna",
> Field.Store.NO, Field.Index.ANALYZED,
> Field.TermVector.WITH_POSITIONS_OFFSETS);
>
> //I know I can  create a query to search for a regular expression and that
> will return each
> //document that contains a match.
>
> IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(),
> true,
>
> IndexWriter.MaxFieldLength.LIMITED);
> writer.addDocument(doc);
> writer.optimize();
> writer.close();
> searcher = new IndexSearcher(directory);
>
> RegexQuery query = new RegexQuery( newTerm("data", "^T.*));
> ScoreDoc[] hits = searcher.search(query, null,
> maxNumOfHits).scoreDocs;//grab the score docs and go through them to find
> the documents that contain a match
>
> //*****************************************************
>
>
> The code above will tell me that both doc1 and doc2 contain a match for the
> constructed query.
>
> However I need to know how many times the regular expression was matched in
> each document. ie.
>
> doc1 = 3
> doc2 = 2
>
> I hope I am being clear...and thanks in advance.
>
>
> Cheers
>
> --
> View this message in context:
> http://old.nabble.com/Finding-frequency-of-regex-query-match-in-a-field-tp27175040p27175040.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Reply via email to