I wonder if the following approach would work and therefore simplify your view of how the data should be indexed.
The approach would be to have a wide search and uses filters from the faceted criteria data So from the examples you gave, the initial search would be: Find X (people/companies/both) from location Y (/null) Then you would use a Filter or filters to give you a search within search, for example using a FieldCacheRangeFilter to limit date ranges, salaries, job titles etc. Chapter 5 of Lucene in Action goes into depth on using filters to 'search within search' -----Original Message----- From: Prescott Nasser [mailto:geobmx...@hotmail.com] Sent: 16 December 2011 19:30 To: lucene-net-user@lucene.apache.org Subject: RE: [Lucene.Net] What would be the best way to index and search my data? I guess there could be a couple of ways to go about this. Let me throw out an idea (that likely isn't perfect) and hopefully some others can jump in and help better or correct it. So first regarding people vs companies - I think that's really straight forward, people documents have their fields, companies have their fields and then you probably want one field that specifies what the type of document is. The relation is where all the complexity is - and I think you want to probably think about what queries are more likely. There are also likely only four options. Do you attach employment history to the person document, to the company document, are relations their own document or (not one I like) do you have that data in both locations. I would be more inclined to attach work history details to the people than the company. Basically a list of employment history records each with an id that links to the company record. Comparing that to your sample queries would mean that you can answer most of them without cross referencing between employee and company (the one that would require it is the .net developers, last two years, in London - assuming London is part of the company.) We have simple faceted search in the contrib which you'll need. Also, im the one who pointed you to this list from SO, and its been very quiet here, for that I apologize. Because our index is the same as java lucenes and their mailing list is far more active, they could probably help you with design if you aren't satisfied with the help you get here. The one thing to be careful about is that we are behind them in development - so some of the suggesting they might make regarding contrib projects to use we might not have. If that's the case drop us a line i can help get those ported over quickly. -Prescott ________________________________ From: Andy McCluggage Sent: 12/14/2011 5:22 AM To: lucene-net-user@lucene.apache.org Subject: [Lucene.Net] What would be the best way to index and search my data? I've found multiple questions that have been asked in various placed online (including StackOverflow <http://stackoverflow.com/questions/8491779/what-would-be-the-best-way-t o-index-and-search-my-data-using-lucene> ) that ask questions along the lines of "How can I index and then search relational data in Lucene". Quite rightly these questions are met with the standard response that Lucene is not designed to model data like this. This quote I found sums it up... "A Lucene Index is a Document Store. In a Document Store, a single document represents a single concept with all necessary data stored to represent that concept (compared to that same concept being spread across multiple tables in an RDBMS requiring several joins to re-create)." So I will not ask that question and instead provide my high level requirements and see if any Lucene gurus out there can help me. * We have data on People (Name, Gender, DOB, Nationality, etc) * And data on Companies (Name, Country, City, etc). * We also have data about how these two types of entity relate to each other where a person worked at the company (Person, Company, Role, Date Started, Date Ended, etc). We have two entities - Person and Company - that have their own properties and then properties exist for the many-to-many link between them. Some example searches could be as follows... * Find all Companies in Australia * Find all People born between two dates * Find all People who have worked as a .Net Developer * Find all males who have worked as a.Net Developer in London. * Find all People who have worked as a .Net Developer between 2008 and 2010 The criteria span all the three sets of data. Our requirement is to provide a Faceted Search <http://en.wikipedia.org/wiki/Faceted_search> over the data that accepts any combination of the various properties, of which I have given some examples. I am aware of the idea that the Index should be constructed with the search in mind. But I can't seem to come up with a sensible index that would meet all the combinations of search criteria * What classes native to Lucene or what extension points can we make use of. * Are there are established techniques for doing this kind of thing? * Are there any third open source contributions that I have missed that will help us here? For now I won't describe the scenarios we have considered because I don't want to bloat out this question and make it too intimidating. Please ask me to elaborate where necessary. Many thanks in advance, Andy ______________________________________________________________________ This email is intended solely for the addressee and is strictly confidential. If you are not the addressee, please do not read, print, re-transmit, store or act in reliance on it or any attachments. Instead please email it back to the sender and delete the message from your computer. Email transmission cannot be guaranteed to be secure or error free and BoardEx accepts no liability for changes made to this email (and any attachments) after it was sent or for viruses arising as a result of this email transmission. BoardEx reserves the right to intercept any emails or other communication for permitted purposes, in accordance with applicable laws, which you send to, or receive from, any of the employees or agents of BoardEx . BoardEx is owned by Management Diagnostics Limited, Elizabeth House, York Road, London, SE1 7NQ. Reg No: 3714017 This email has been scanned for viruses by the Email Protection Agency ______________________________________________________________________