Many times I've written ad-hoc code that pulls in data from an RDBMS and builds a Lucene index. The use case is a typical database-driven dynamic website which would be a hassle to spider (say, due to tricky authentication).

I had a feeling this had been done in a general manner but didn't see any code in the sandbox, nor did any searches turn it up.

I've spent a few mins thinking this thru - what I'd expect is to be able to configure is:

1. JDBC Driver + conn params
2. Query to do a 1 time full index
3. Query to show new records
4. Query to show changed records
5. Query to show deleted records
6. Query columns to Lucene Field name mapping
7. "Type" of each field name (e.g. the equivalent of the args to the Field ctr)


So a simple example, taking item 2 is

        query: "select url, name, body from foo"

(now the column to field mapping)
        col 1 => url
        col 2 => title
        col 3 => contents

(now the field types for each named field)

        url => Field( ...store=true, index=false)
      title => Field( ...store=true, index=true)
   contents => Field( ...store=false, index=true)



And voilla, nice, elegant, data driven indexing.
Does it exist?
Should it? :)

PS
I know in the more general form, "query" needs to be replaced by "queries" above, and the updated query may need some time stamp variable expansion, and possibly the queries need "paging" to deal w/ lamo DBs like mysql that don't have cursors for large result sets...





--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to