(re)building the index separately (ie. on a different computer) and then replacing the active index may be an option.

David Whalen
What we're looking for is a way to inject *without* using
curl, or wget, or any other http-based communication.  We'd
like for the HTTP daemon to only handle search requests, not
indexing requests on top of them.

Plus, I have to believe there's a faster way to get documents
into solr/lucene than using curl....

Condensing the loader into a single executable sounds right if you have performance problems. ;-)

You could also try adding multiple <doc>s in a single post if you notice your problems are with tcp setup time, though if you're doing localhost connections that should be minimal.

If you're already local to the solr server, you might check out the CSV slurper. http://wiki.apache.org/solr/UpdateCSV It's a little specialized.

And then there's of course the question of "are you doing full re-indexing or incremental indexing of changes?"


Kevin Holmes
I inherited an existing (working) solr indexing script that
runs like

Python script queries the mysql DB then calls bash script

Bash script performs a curl POST submit to solr

We're injecting about 1000 records / minute (constantly),
pushing the edge of our CPU / RAM limitations.

I'm in the process of building a Perl script to use DBI and lwp::simple::post that will perform this all from a single script (instead of 3).

Two specific questions

1: Does anyone have a clever (or better) way to perform
this process

2: Is there a way to inject into solr without using POST /
curl / http?

Admittedly, I'm no solr expert - I'm starting from someone else's setup, trying to reverse-engineer my way out. Any input would be greatly appreciated.

