Re: Searching/sorting strategy for many properties for semantic web app

2006-02-23 Thread Erik Hatcher


On Feb 22, 2006, at 9:01 PM, David Pratt wrote:
Hi Erik. Many thanks for your reply. I'll likely see if I can find  
a list to pose a couple of questions there way. I am having fun  
with Lucene since it is new to me and I am impressed with the speed  
I am getting. I am reading anything I can get hold of and trying  
different code experiments. So far, the code is fairly straight  
forward so not so concerned about this at the moment.


I am really hoping to hear from experienced people like yourself  
more on strategically what to index, what sort of things it would  
be a good idea to store and what to do about a fairly large schema  
that has much metadata to offer. Also perhaps when sorting and  
filtering gets too expensive. I realize that just because the  
metadata is available doesn't necessarily mean you want to even put  
it all in an index. I think these issues are pretty general,  
however I know there are folks on this that would likely advise  
some particular path or direction because of their own experiences  
with Lucene. I would really like to hear from anyone that has been  
working with metadata particularly or anyone generally about these  
topics.


In my University job, I'm dealing with a fair bit of metadata in the  
form of RDF about 19th century literature objects.  I'm indexing  
basic Dublin Core data such as title and author as individual fields,  
and also dropping all indexed metadata into a single searchable  
field.  I've been using Kowari as the metadata store, but it also has  
Lucene integration (that I've not tried myself yet).


I'm not sure what else to add as your query is a bit general.  I  
think you'll find if you post more specific questions you're more  
likely to get detailed responses.  General queries tend to be too  
general to respond to, I find.


There really are no best practices with Lucene in terms of what to  
index, what to store - these are all highly application dependent and  
is often something I tune as the application itself evolves.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching/sorting strategy for many properties for semantic web app

2006-02-22 Thread Erik Hatcher
One very nice implementation to take a look at is the Simile project  
at MIT.   The Piggy Bank and Longwell projects use Lucene to index  
RDF and integrate full-text and structural queries nicely together.
http://simile.mit.edu


Erik

On Feb 21, 2006, at 10:20 PM, David Pratt wrote:

Hi there. I am new to Lucene and I have been developing a semantic  
application for a while and it appears to me Lucene could help me  
to get a much needed search with reasonable speed. I have some  
general question to start:


1) Since my app is virtually all metadata, what should I store in  
the indexes if anything?
2) Should I only index the most common properties that people will  
search and combine the rest (and index this combined text as a field)?
3) I would like to sort and filter results but am concerned this  
could be very memory intensive
4) Some general guidance on organizing indexes in an app would be  
appreciated.


My schema is fairly large but I generally expect people to search  
on about 6 to 8 properties for the most part. I have the data  
stored in an sql database but not in a conventional way. I am  
willing to accept a slower advanced search on less common  
properties (accomodating this with sql search) but I really want  
some speed for the main properties with full text search.


Pretty much everything in the app is metadata so I am most  
interested in  focussing on the 6-8 properties that people will use  
to search on for the most part. I am thinking of combining the text  
of the remaining properties (quite a number) into a single  
description type field so that essentially all information gets  
indexed and ranked. Is this a reasonable approach?


I see that there are advanced possibilities with the indexes to  
sort and filter. How advisable is using sort for large record sets.  
For example, say you have got 2 records returned from your  
search. Because this will have a web interface I will only be  
showing first 20 likely so I will be batching results. Is the  
sorting filtering highly memory intensive?


Hopefully, someone can provide some initial advice. Many thanks.

Regards,
David

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching/sorting strategy for many properties for semantic web app

2006-02-22 Thread David Pratt
Hi Erik. Many thanks for your reply. I'll likely see if I can find a 
list to pose a couple of questions there way. I am having fun with 
Lucene since it is new to me and I am impressed with the speed I am 
getting. I am reading anything I can get hold of and trying different 
code experiments. So far, the code is fairly straight forward so not so 
concerned about this at the moment.


I am really hoping to hear from experienced people like yourself more on 
strategically what to index, what sort of things it would be a good idea 
to store and what to do about a fairly large schema that has much 
metadata to offer. Also perhaps when sorting and filtering gets too 
expensive. I realize that just because the metadata is available doesn't 
necessarily mean you want to even put it all in an index. I think these 
issues are pretty general, however I know there are folks on this that 
would likely advise some particular path or direction because of their 
own experiences with Lucene. I would really like to hear from anyone 
that has been working with metadata particularly or anyone generally 
about these topics.


Regards,
David


Erik Hatcher wrote:
One very nice implementation to take a look at is the Simile project  at 
MIT.   The Piggy Bank and Longwell projects use Lucene to index  RDF and 
integrate full-text and structural queries nicely together.
http://simile.mit.edu


Erik

On Feb 21, 2006, at 10:20 PM, David Pratt wrote:

Hi there. I am new to Lucene and I have been developing a semantic  
application for a while and it appears to me Lucene could help me  to 
get a much needed search with reasonable speed. I have some  general 
question to start:


1) Since my app is virtually all metadata, what should I store in  the 
indexes if anything?
2) Should I only index the most common properties that people will  
search and combine the rest (and index this combined text as a field)?
3) I would like to sort and filter results but am concerned this  
could be very memory intensive
4) Some general guidance on organizing indexes in an app would be  
appreciated.


My schema is fairly large but I generally expect people to search  on 
about 6 to 8 properties for the most part. I have the data  stored in 
an sql database but not in a conventional way. I am  willing to accept 
a slower advanced search on less common  properties (accomodating this 
with sql search) but I really want  some speed for the main properties 
with full text search.


Pretty much everything in the app is metadata so I am most  interested 
in  focussing on the 6-8 properties that people will use  to search on 
for the most part. I am thinking of combining the text  of the 
remaining properties (quite a number) into a single  description type 
field so that essentially all information gets  indexed and ranked. Is 
this a reasonable approach?


I see that there are advanced possibilities with the indexes to  sort 
and filter. How advisable is using sort for large record sets.  For 
example, say you have got 2 records returned from your  search. 
Because this will have a web interface I will only be  showing first 
20 likely so I will be batching results. Is the  sorting filtering 
highly memory intensive?


Hopefully, someone can provide some initial advice. Many thanks.

Regards,
David

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching/sorting strategy for many properties for semantic web app

2006-02-21 Thread David Pratt
Hi there. I am new to Lucene and I have been developing a semantic 
application for a while and it appears to me Lucene could help me to get 
a much needed search with reasonable speed. I have some general question 
to start:


1) Since my app is virtually all metadata, what should I store in the 
indexes if anything?
2) Should I only index the most common properties that people will 
search and combine the rest (and index this combined text as a field)?
3) I would like to sort and filter results but am concerned this could 
be very memory intensive
4) Some general guidance on organizing indexes in an app would be 
appreciated.


My schema is fairly large but I generally expect people to search on 
about 6 to 8 properties for the most part. I have the data stored in an 
sql database but not in a conventional way. I am willing to accept a 
slower advanced search on less common properties (accomodating this with 
sql search) but I really want some speed for the main properties with 
full text search.


Pretty much everything in the app is metadata so I am most interested in 
 focussing on the 6-8 properties that people will use to search on for 
the most part. I am thinking of combining the text of the remaining 
properties (quite a number) into a single description type field so that 
essentially all information gets indexed and ranked. Is this a 
reasonable approach?


I see that there are advanced possibilities with the indexes to sort and 
filter. How advisable is using sort for large record sets. For example, 
say you have got 2 records returned from your search. Because this 
will have a web interface I will only be showing first 20 likely so I 
will be batching results. Is the sorting filtering highly memory intensive?


Hopefully, someone can provide some initial advice. Many thanks.

Regards,
David

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]