Hi,
This question is going to be a little complicated to explain, but let me try.
I have implemented an indexer app based on the demo IndexFiles app, and a web
app based on the luceneweb web app for the searching.
In my case, the "Documents" that I'm indexing are a proprietary file type, and
each document has kind of "sub-documents". So, in my indexer, I parse each of
the sub-documents, and, for a given "Document", I build a long string
containing terms that I extracted from each of the sub-documents, then I do:
doc.add(new Field("contents", longstring, Field.Store.YES,
Field.Index.ANALYZED));
I also add the longstring to another non-indexed field, summary:
doc.add(new Field("summary", longstring, Field.Store.YES, Field.Index.NO));
The modified luceneweb web app that I use is pretty vanilla, and originally,
what I was asked to do was to be able to search just for a Document, i.e.,
given a query like "X and Y" (document containing both term=X and term=Y),
return the file path+name for the document. I also was displaying the terms
associated with each sub-document by parsing the 'summary' string.
So, for example, if "Document1" contained 3 sub-documents (which contained
(term1, term2), (term1a, term2a), and (term1b, term2b), respectively), and if I
queried for "term1a AND term2a", the web app would display something like:
Document1 subdoc1 term1 term2
subdoc2 term1a term2a
subdoc3 term1b term2b
However, I've now been asked to implement the ability to query the
sub-documents.
In other words, rather than the web app displaying what I showed above, they
want it to return something like just:
Document1 subdoc2 term1a term2a
Right now, the web app gets the 'summary' (again, in a long string), then just
breaks it into subdoc1, subdoc2, and subdoc3 lines, just for display purposes,
so to do what I've been asked, I need to query the 3 sub-strings from the
'summary', i.e., run the "term1a AND term2a" query against the following
strings:
subdoc1 term1 term2
subdoc2 term1a term2a
subdoc3 term1b term2b
I guess that I can write a method to do that, but I want to make sure that the
sub-document/string query "duplicates" the behavior of the Lucene query.
It seems like there should be a way to duplicate the Lucene query logic by
using something (methods) in Lucene itself??
I've been reviewing the Javadocs, but I'm still fairly new to Lucene, so I was
hoping that someone could point me in the right direction?
My apologies for the longish post, but I hope that I've been able to explain
clearly :)!!
Thanks,
Jim
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]