I've found multiple questions that have been asked in various placed
online (including StackOverflow
<http://stackoverflow.com/questions/8491779/what-would-be-the-best-way-t
o-index-and-search-my-data-using-lucene> ) that ask questions along the
lines of "How can I index and then search relational data in Lucene".
Quite rightly these questions are met with the standard response that
Lucene is not designed to model data like this. This quote I found sums
it up...

"A Lucene Index is a Document Store. In a Document Store, a single
document represents a single concept with all necessary data stored to
represent that concept (compared to that same concept being spread
across multiple tables in an RDBMS requiring several joins to
re-create)."

So I will not ask that question and instead provide my high level
requirements and see if any Lucene gurus out there can help me.

*       We have data on People (Name, Gender, DOB, Nationality, etc)

*       And data on Companies (Name, Country, City, etc).

*       We also have data about how these two types of entity relate to
each other where a person worked at the company (Person, Company, Role,
Date Started, Date Ended, etc).

 

We have two entities - Person and Company - that have their own
properties and then properties exist for the many-to-many link between
them.

Some example searches could be as follows...

*       Find all Companies in Australia

*       Find all People born between two dates

*       Find all People who have worked as a .Net Developer

*       Find all males who have worked as a.Net Developer in London.

*       Find all People who have worked as a .Net Developer between 2008
and 2010

 

 

The criteria span all the three sets of data. Our requirement is to
provide a Faceted Search <http://en.wikipedia.org/wiki/Faceted_search>
over the data that accepts any combination of the various properties, of
which I have given some examples.

 

I am aware of the idea that the Index should be constructed with the
search in mind. But I can't seem to come up with a sensible index that
would meet all the combinations of search criteria

*       What classes native to Lucene or what extension points can we
make use of.

*       Are there are established techniques for doing this kind of
thing?

*       Are there any third open source contributions that I have missed
that will help us here?

 

For now I won't describe the scenarios we have considered because I
don't want to bloat out this question and make it too intimidating.
Please ask me to elaborate where necessary.

Many thanks in advance,

Andy

 

______________________________________________________________________
This email is intended solely for the addressee and is strictly confidential. 
If you are not the addressee, please do not read, print, re-transmit, store or 
act in reliance on it or any attachments. Instead please email it back to the 
sender and delete the message from your computer.

Email transmission cannot be guaranteed to be secure or error free and BoardEx® 
accepts no liability for changes made to this email (and any attachments) after 
it was sent or for viruses arising as a result of this email transmission.

BoardEx® reserves the right to intercept any emails or other communication for 
permitted purposes, in accordance with applicable laws, which you send to, or 
receive from, any of the employees or agents of BoardEx®. BoardEx® is owned by  
Management Diagnostics Limited, Elizabeth House, York Road, London, SE1 7NQ.  
Reg No:  3714017

This email has been scanned for viruses by the Email Protection Agency
______________________________________________________________________

Reply via email to