Re: [CODE4LIB] Getting started with SOLR

Peter Kiraly Thu, 22 Nov 2007 01:05:51 -0800

Hi, Michael,

I created a schema.xml where basically every field is of type "text" for
the beginning. Do you use specialized types for authors or ISBNs or
other fields?

I use different fields for every MARC fields i want to search,
moreover there is a field: UDC notation which is split up
to atomic notation, so 1 complex udc will be 3+ Solr fields.

How do you handle multi-value fields? Do you feed everything in a single
field (like "Smith, James ; Miller, Steve" as I have seen in a pure
Lucene implementation of a collegue or do you use the multiValued
feature of SOLR?

I usually create different fields with the same name.
I do it in Lucene as well. There is no problem with
repeating fields (same name, different values of course).

What about boosting? I thought of giving the current year a boost="3.0"
and then 0.1 less for every year the title is older, down to 1.0 for a
21-year-old book. The idea is to have a sort that tends to promote
recent titles but still respects other aspects. Does this sound
reasonable or are there other ideas? I would be very interested in an
actual boosting-scheme from where I could start.

That sound reasonable.

We have a couple of databases that should eventually indexed. Do you
build one huge database with an additional "database" field or is it
better to have every database in its own SOLR instance?

Our projects usually builds one index from different
sources - but it depends on the nature of your project.
We built up an application to which we convert 110+
CD-ROMs (originally in Folio database) - this covers
2 200 000+ xhtml page, and there are some search forms
for the different DBs. It is a Lucene project, not Solr.

How do you fill the index? Our main database has about 700,000 records
and I don't know if I should build one huge XML-file and feed that into
SOLR or use a script that sends one record at a time with a commit after
every 1000 records or so. Or do something in between and split it into
chunks of a few thousand records each? What are your experiences? What
if a record gives an error? Will the whole file be recjected or just
that one record?

There is a Java command line tool or you can see the VuFind's
solution. If you can, I suggest you to prefer a pure java
solution, writing directly to the Solr index (with the Solr
API), because its much-much more quicker than the PHP's
(Rail's, Perl's) solution which based on a web-service
(which need the PHP parsing and HTTP request curve).
The PHP solution does nothing with Solr directly, it
use the web service, and all the code can be rewriten
in Perl.

Any other ideas, further reading, experiences...?

See the source files of the solutions based on Solr, there
are some, even in the library scene (PHP, Rail, Python).
More info:
http://del.icio.us/popular/solr


Peter Kiraly
http://www.tesuji.eu

Re: [CODE4LIB] Getting started with SOLR

Reply via email to