Re: a question about indexing database tables

Erick Erickson Thu, 22 Feb 2007 05:34:06 -0800

OK, I was off on a tangent. We've had several discussions where people were
effectively trying to replace a RDBMS with Lucene and finding out it that
RDBMSs are very good at what they do <G>...


But in general, I'd probably approach it by doing the RDBMS work first and
indexing the result. I think this is your option (2). Yes, this will
de-normalize a bunch of your data and you'll chew up some space, but disk
space is cheap. Very cheap <G>.

One thing to remember, though, that took me a while to get used to,
especially when I had my database hat on. There's no requirement that every
document in a Lucene index have the same fields. Conceptually, you can store
*all* your tables in the same index. So a document for table one has fields
table_1_field1 table_1_field2 table_1_field3. "documents" for table two have
fields table_2_field1 table_2_field2 etc.....

These documents will never interfere with each other during searches because
they share no fields (and each query goes against a particular field).

I mention this because your maintenance will be much easier if you only have
one index <G>....

Best
Erick

On 2/22/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote:


Thanks Erick
but we have to because we need to execute very big queries that create
traffik network and are very very slow. but with lucene we do it in some
milliseconds. and now we indexed our needed information by joining tables.
it works fine, besides, it returns the exact result as we can get from
database. we indexed about one million records.
but let me say, we are not using it instead of database, we use it to
generate some dynamic reports that if we did it by sql queries, it would
take about 15 minutes.

On 2/22/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> don't do either one <G>.... Search this mail archive for discussions of
> databases, there are several long threads discussing this along with
> various
> options on how to make this work. See particularly a mail entitled
> *Oracle/Lucene
> integration -status- *and any discussions participated in by Marcelo
> Ochoa.
>
> But, in general, Lucene is a text search engine, NOT a RDBMS. When you
> start
> saying "keep all relation in order to get right result", it sounds like
> you're trying to use Lucene as a RDBMS. It doesn't do this very well,
> that's
> not what it was built for. There are several options...
> > get clever with your index such that you don't do anything like join
> tables. This implies that you re-design your data layout, probably
> de-normalizing lots of data, etc.
> > Use a hybrid solution. That is, use Lucene to search text and then do
> whatever further relational processing you need in the database. You
need
> to
> store enough information in the Lucene documents to be able to query the
> database.
> > stick with a database if it works for you already.
>
> In general, it's a mis-use of lucene to try to get RDBMS behavior out of
> it.
> When you find yourself trying to do this, take a few minutes and ask
> yourself if this design is appropriate, and continue only if you can
> answer
> in the affirmative...
>
> Best
> Erick
>
> On 2/22/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote:
> >
> > Hello
> > In our application we have to index the database tables, there is two
> way
> > to
> > make this
> >
> > 1- index each table in a separate directory and then keep all relation
> in
> > order to get right result. in this method, we should use filters to
> > overcome
> > the problem of searching on another search result.
> > 2. joining two or more tables and index the result of join query.
> >
> > which approach is better, reliable, has acceptable performance.
> >
> > thanks
> > --
> > Regards,
> > Mohammad
> >
>



--
Regards,
Mohammad

Re: a question about indexing database tables

Reply via email to