OK, I was off on a tangent. We've had several discussions where people were effectively trying to replace a RDBMS with Lucene and finding out it that RDBMSs are very good at what they do <G>...
But in general, I'd probably approach it by doing the RDBMS work first and indexing the result. I think this is your option (2). Yes, this will de-normalize a bunch of your data and you'll chew up some space, but disk space is cheap. Very cheap <G>. One thing to remember, though, that took me a while to get used to, especially when I had my database hat on. There's no requirement that every document in a Lucene index have the same fields. Conceptually, you can store *all* your tables in the same index. So a document for table one has fields table_1_field1 table_1_field2 table_1_field3. "documents" for table two have fields table_2_field1 table_2_field2 etc..... These documents will never interfere with each other during searches because they share no fields (and each query goes against a particular field). I mention this because your maintenance will be much easier if you only have one index <G>.... Best Erick On 2/22/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote:
Thanks Erick but we have to because we need to execute very big queries that create traffik network and are very very slow. but with lucene we do it in some milliseconds. and now we indexed our needed information by joining tables. it works fine, besides, it returns the exact result as we can get from database. we indexed about one million records. but let me say, we are not using it instead of database, we use it to generate some dynamic reports that if we did it by sql queries, it would take about 15 minutes. On 2/22/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > don't do either one <G>.... Search this mail archive for discussions of > databases, there are several long threads discussing this along with > various > options on how to make this work. See particularly a mail entitled > *Oracle/Lucene > integration -status- *and any discussions participated in by Marcelo > Ochoa. > > But, in general, Lucene is a text search engine, NOT a RDBMS. When you > start > saying "keep all relation in order to get right result", it sounds like > you're trying to use Lucene as a RDBMS. It doesn't do this very well, > that's > not what it was built for. There are several options... > > get clever with your index such that you don't do anything like join > tables. This implies that you re-design your data layout, probably > de-normalizing lots of data, etc. > > Use a hybrid solution. That is, use Lucene to search text and then do > whatever further relational processing you need in the database. You need > to > store enough information in the Lucene documents to be able to query the > database. > > stick with a database if it works for you already. > > In general, it's a mis-use of lucene to try to get RDBMS behavior out of > it. > When you find yourself trying to do this, take a few minutes and ask > yourself if this design is appropriate, and continue only if you can > answer > in the affirmative... > > Best > Erick > > On 2/22/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote: > > > > Hello > > In our application we have to index the database tables, there is two > way > > to > > make this > > > > 1- index each table in a separate directory and then keep all relation > in > > order to get right result. in this method, we should use filters to > > overcome > > the problem of searching on another search result. > > 2. joining two or more tables and index the result of join query. > > > > which approach is better, reliable, has acceptable performance. > > > > thanks > > -- > > Regards, > > Mohammad > > > -- Regards, Mohammad