Hi Erik. Many thanks for your reply. I'll likely see if I can find a
list to pose a couple of questions there way. I am having fun with
Lucene since it is new to me and I am impressed with the speed I am
getting. I am reading anything I can get hold of and trying different
code experiments. So far, the code is fairly straight forward so not so
concerned about this at the moment.
I am really hoping to hear from experienced people like yourself more on
strategically what to index, what sort of things it would be a good idea
to store and what to do about a fairly large schema that has much
metadata to offer. Also perhaps when sorting and filtering gets too
expensive. I realize that just because the metadata is available doesn't
necessarily mean you want to even put it all in an index. I think these
issues are pretty general, however I know there are folks on this that
would likely advise some particular path or direction because of their
own experiences with Lucene. I would really like to hear from anyone
that has been working with metadata particularly or anyone generally
about these topics.
Regards,
David
Erik Hatcher wrote:
One very nice implementation to take a look at is the Simile project at
MIT. The Piggy Bank and Longwell projects use Lucene to index RDF and
integrate full-text and structural queries nicely together.
http://simile.mit.edu
Erik
On Feb 21, 2006, at 10:20 PM, David Pratt wrote:
Hi there. I am new to Lucene and I have been developing a semantic
application for a while and it appears to me Lucene could help me to
get a much needed search with reasonable speed. I have some general
question to start:
1) Since my app is virtually all metadata, what should I store in the
indexes if anything?
2) Should I only index the most common properties that people will
search and combine the rest (and index this combined text as a field)?
3) I would like to sort and filter results but am concerned this
could be very memory intensive
4) Some general guidance on organizing indexes in an app would be
appreciated.
My schema is fairly large but I generally expect people to search on
about 6 to 8 properties for the most part. I have the data stored in
an sql database but not in a conventional way. I am willing to accept
a slower advanced search on less common properties (accomodating this
with sql search) but I really want some speed for the main properties
with full text search.
Pretty much everything in the app is metadata so I am most interested
in focussing on the 6-8 properties that people will use to search on
for the most part. I am thinking of combining the text of the
remaining properties (quite a number) into a single description type
field so that essentially all information gets indexed and ranked. Is
this a reasonable approach?
I see that there are advanced possibilities with the indexes to sort
and filter. How advisable is using sort for large record sets. For
example, say you have got 20000 records returned from your search.
Because this will have a web interface I will only be showing first
20 likely so I will be batching results. Is the sorting filtering
highly memory intensive?
Hopefully, someone can provide some initial advice. Many thanks.
Regards,
David
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]