Re: Searching/sorting strategy for many properties for semantic web app

David Pratt Wed, 22 Feb 2006 18:02:05 -0800

Hi Erik. Many thanks for your reply. I'll likely see if I can find alist to pose a couple of questions there way. I am having fun withLucene since it is new to me and I am impressed with the speed I amgetting. I am reading anything I can get hold of and trying differentcode experiments. So far, the code is fairly straight forward so not soconcerned about this at the moment.

I am really hoping to hear from experienced people like yourself more onstrategically what to index, what sort of things it would be a good ideato store and what to do about a fairly large schema that has muchmetadata to offer. Also perhaps when sorting and filtering gets tooexpensive. I realize that just because the metadata is available doesn'tnecessarily mean you want to even put it all in an index. I think theseissues are pretty general, however I know there are folks on this thatwould likely advise some particular path or direction because of theirown experiences with Lucene. I would really like to hear from anyonethat has been working with metadata particularly or anyone generallyabout these topics.


Regards,
David


Erik Hatcher wrote:

One very nice implementation to take a look at is the Simile project atMIT. The Piggy Bank and Longwell projects use Lucene to index RDF andintegrate full-text and structural queries nicely together.http://simile.mit.edu
    Erik

On Feb 21, 2006, at 10:20 PM, David Pratt wrote:
Hi there. I am new to Lucene and I have been developing a semanticapplication for a while and it appears to me Lucene could help me toget a much needed search with reasonable speed. I have some generalquestion to start:
1) Since my app is virtually all metadata, what should I store in theindexes if anything?2) Should I only index the most common properties that people willsearch and combine the rest (and index this combined text as a field)?3) I would like to sort and filter results but am concerned thiscould be very memory intensive4) Some general guidance on organizing indexes in an app would beappreciated.
My schema is fairly large but I generally expect people to search onabout 6 to 8 properties for the most part. I have the data stored inan sql database but not in a conventional way. I am willing to accepta slower advanced search on less common properties (accomodating thiswith sql search) but I really want some speed for the main propertieswith full text search.
Pretty much everything in the app is metadata so I am most interestedin focussing on the 6-8 properties that people will use to search onfor the most part. I am thinking of combining the text of theremaining properties (quite a number) into a single description typefield so that essentially all information gets indexed and ranked. Isthis a reasonable approach?
I see that there are advanced possibilities with the indexes to sortand filter. How advisable is using sort for large record sets. Forexample, say you have got 20000 records returned from your search.Because this will have a web interface I will only be showing first20 likely so I will be batching results. Is the sorting filteringhighly memory intensive?
Hopefully, someone can provide some initial advice. Many thanks.

Regards,
David

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Searching/sorting strategy for many properties for semantic web app

Reply via email to