[jira] Commented: (SOLR-469) Data Import RequestHandler

Grant Ingersoll (JIRA) Thu, 26 Jun 2008 06:26:50 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608421#action_12608421
 ]


Grant Ingersoll commented on SOLR-469:
--------------------------------------

Patch applies cleanly, tests pass, although I notice several @ignore in there.  
 Docs look good in my preliminary perusing.  I've only started looking at 
things, and have a lot of reading to catch up on, so these first comments, 
please take with a grain of salt, as the English saying goes... 

I'd suggest,that instead of relying on MySQL in TestJdbcDataSource, we instead 
use and embedded Derby or some sort of JDBC mock.  I suggest Derby mainly b/c 
it's already ASF and I don't want to bother looking up licenses for HSQL or any 
of the others that might work.  

Also, I notice several interfaces that have a number of methods on them.  Have 
you thought about abstract base classes instead?  I know, there is a whole big 
debate over it, and people will argue that if you get the interface exactly 
correct, you should use interfaces.  Nice in theory, but Lucene/Solr experience 
suggests that rarely happens.  Of course, I think the correct way is to 
actually do both, as one can easily decorate an abstract base class with more 
interfaces as needed.  Just food for thought, b/c what's going to quickly 
happen after release is someone is going to need a new method on the DataSource 
or something and then we are going to be stuck doing all kinds of workarounds 
due to back compatibility reasons.  The alternative is to clearly mark all 
Interfaces as being experimental at this point and clearly note that we expect 
them to change.  We may even want to consider both!  The other point, though, 
is contrib packages need not be held to the same standard as core when it comes 
to back compat.

What relation does the Context have to the HttpDataSource?  In other words, the 
DataSource init method takes a Context, meaning the HttpDataSource needs a 
Context, yet in my first glance at the Context, it seems to be DB related.

What if I wanted to slurp from a table on the fly?  That is, I want to send in 
a select statement in my request and I let the columns line up where they may 
Field wise (i.e. via dynamic fields or I rely on something like select id, colA 
as fieldA, colB as fieldB from MyTable;   )
Is that possible?  

Interactive mode has a bit of a chicken and the egg problem when it comes to 
JDBC, right, in that the Driver needs to be present in Solr/lib right?  So, one 
can currently only interactively configure a JDBC DataSource if the driver is 
already in lib and loaded by the ClassLoader.   If you haven't already, it 
might actually be useful to show what drivers are present by using the 
DriverManager.

In the JDBCDataSource, not sure I follow the connection stuff.  Can you explain 
a bit?  Also, what if I wanted to plug in my own Connection Pooling library, as 
I may already have one setup for other things (if using Solr embedded)?




> Data Import RequestHandler
> --------------------------
>
>                 Key: SOLR-469
>                 URL: https://issues.apache.org/jira/browse/SOLR-469
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Grant Ingersoll
>             Fix For: 1.3
>
>         Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
>     * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>           - It also takes in a properties file for the data source 
> configuraution
>     * Given the configuration it can also generate the solr schema.xml
>     * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>           -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>           - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
>     * It provides a admin page
>           - where we can schedule it to be run automatically at regular 
> intervals
>           - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-469) Data Import RequestHandler

Reply via email to