I wonder if the following approach would work and therefore simplify your view 
of how the data should be indexed.

The approach would be to have a wide search and uses filters from the faceted 
criteria data 

So from the examples you gave, the initial search would be:

Find X (people/companies/both) from location Y (/null) 

Then you would use a Filter or filters to give you a search within search, for 
example using a FieldCacheRangeFilter to limit date ranges, salaries, job 
titles etc.

Chapter 5 of Lucene in Action goes into depth on using filters to 'search 
within search'


-----Original Message-----
From: Prescott Nasser [mailto:geobmx...@hotmail.com] 
Sent: 16 December 2011 19:30
To: lucene-net-user@lucene.apache.org
Subject: RE: [Lucene.Net] What would be the best way to index and search my 
data?

I guess there could be a couple of ways to go about this. Let me throw out an 
idea (that likely isn't perfect) and hopefully some others can jump in and help 
better or correct it.

So first regarding people vs companies - I think that's really straight 
forward, people documents have their fields, companies have their fields and 
then you probably want one field that specifies what the type of document is.

The relation is where all the complexity is  - and I think you want to probably 
think about what queries are more likely. There are also likely only four 
options. Do you attach employment history to the person document, to the 
company document, are relations their own document or (not one I like) do you 
have that data in both locations.

I would be more inclined to attach work history details to the people than the 
company. Basically a list of employment history records each with an id that 
links to the company record. Comparing that to your sample queries would mean 
that you can answer most of them without cross referencing between employee and 
company (the one that would require it is the .net developers, last two years, 
in London - assuming London is part of the company.)

We have simple faceted search in the contrib which you'll need.

Also, im the one who pointed you to this list from SO, and its been very quiet 
here, for that I apologize. Because our index is the same as java lucenes and 
their mailing list is far more active, they could probably help you with design 
if you aren't satisfied with the help you get here. The one thing to be careful 
about is that we are behind them in development - so some of the suggesting 
they might make regarding contrib projects to use we might not have. If that's 
the case drop us a line i can help get those ported over quickly.

-Prescott
________________________________
From: Andy McCluggage
Sent: 12/14/2011 5:22 AM
To: lucene-net-user@lucene.apache.org
Subject: [Lucene.Net] What would be the best way to index and search my data?

I've found multiple questions that have been asked in various placed online 
(including StackOverflow 
<http://stackoverflow.com/questions/8491779/what-would-be-the-best-way-t
o-index-and-search-my-data-using-lucene> ) that ask questions along the
lines of "How can I index and then search relational data in Lucene".
Quite rightly these questions are met with the standard response that Lucene is 
not designed to model data like this. This quote I found sums it up...

"A Lucene Index is a Document Store. In a Document Store, a single document 
represents a single concept with all necessary data stored to represent that 
concept (compared to that same concept being spread across multiple tables in 
an RDBMS requiring several joins to re-create)."

So I will not ask that question and instead provide my high level requirements 
and see if any Lucene gurus out there can help me.

*       We have data on People (Name, Gender, DOB, Nationality, etc)

*       And data on Companies (Name, Country, City, etc).

*       We also have data about how these two types of entity relate to
each other where a person worked at the company (Person, Company, Role, Date 
Started, Date Ended, etc).



We have two entities - Person and Company - that have their own properties and 
then properties exist for the many-to-many link between them.

Some example searches could be as follows...

*       Find all Companies in Australia

*       Find all People born between two dates

*       Find all People who have worked as a .Net Developer

*       Find all males who have worked as a.Net Developer in London.

*       Find all People who have worked as a .Net Developer between 2008
and 2010





The criteria span all the three sets of data. Our requirement is to provide a 
Faceted Search <http://en.wikipedia.org/wiki/Faceted_search>
over the data that accepts any combination of the various properties, of which 
I have given some examples.



I am aware of the idea that the Index should be constructed with the search in 
mind. But I can't seem to come up with a sensible index that would meet all the 
combinations of search criteria

*       What classes native to Lucene or what extension points can we
make use of.

*       Are there are established techniques for doing this kind of
thing?

*       Are there any third open source contributions that I have missed
that will help us here?



For now I won't describe the scenarios we have considered because I don't want 
to bloat out this question and make it too intimidating.
Please ask me to elaborate where necessary.

Many thanks in advance,

Andy



______________________________________________________________________
This email is intended solely for the addressee and is strictly confidential. 
If you are not the addressee, please do not read, print, re-transmit, store or 
act in reliance on it or any attachments. Instead please email it back to the 
sender and delete the message from your computer.

Email transmission cannot be guaranteed to be secure or error free and BoardEx  
accepts no liability for changes made to this email (and any attachments) after 
it was sent or for viruses arising as a result of this email transmission.

BoardEx  reserves the right to intercept any emails or other communication for 
permitted purposes, in accordance with applicable laws, which you send to, or 
receive from, any of the employees or agents of BoardEx . BoardEx  is owned by  
Management Diagnostics Limited, Elizabeth House, York Road, London, SE1 7NQ.  
Reg No:  3714017

This email has been scanned for viruses by the Email Protection Agency 
______________________________________________________________________

Reply via email to