sorry for posting here, this didn't get through solr-user for some reason... (didn't solr and lucene mailing lists merge?)
Hi Solr experts, We are contemplating a switch from our infrastructure around Lucene to Solr based solutions. I would appreciate a kick-start pointers where to look at, in order not to read *everything* about solr (it is quite a lot) :) Apologies in advance if dumb, RTFM-like questions. We are dealing with mainly structured data, extremely short documents with a couple of fields indexing huge CVS files (200Mio Documents and up). Q1: If one field in this CVS has YYYMMDD format, and I want to index it as a DateField without wasting CPU cycles on “YYYYMMDD” -> “ISO 8601” (We do not want to use DIH). - Should I introduce my own type MyDateField extends/wraps DateField (TrieDateField), and override toInternal() /toExternal().. methods? Indexed format of the field would remain the same. Now the question, can solr utilize such MyDateField for faceting/searching and all goodies as if it were pure DateField? - Deal with it in UpdateRequestProcessor somehow? Q2: We intend to try 1-Master / N-Slaves configuration with solr and it looks like it can upgrade slaves “out of the box” with index changes (great!). But my problem is following, how I can distribute “my configuration” to slaves. One example for “configuration”, during full update (complete re-index) we count symbol statistics for all characters appearing in particular fields that are used to seed a battery of static HuffmanDecoders/Encoders that compress our stored fields. How I distribute such “configuration”… can the same process for distributing changed Lucene segments be “enhanced”, or “copy-pasted” or whatever to look into some user specified “folder root” and replicate complete sub-tree along with Index? Versioning of such things can be done in user code. On receiving side I just need to be notified of “change happened in your files” in order to reload my “configuration bits”. I guess this is somehow already possible; people want to distribute their apps, not only lucene index? Q3: Our app uses Lucene Index as a search index and as a database. In this app, user issues a Request that is nothing at all like Lucene search request. Our user does not know how to write Queries. End user code sends only Key-value pairs (Field Name, Value) to solr and we internally do the following: 1. Rewrite this “UserRequest” to *many* Lucene Queries 2. for each hit we fetch one stored field containing our original document from CSV in compressed form, so we decompress it. 3. We Clussify these Lucene responses to some “Hit Classes” (we add “Hit type” field in response) 4. We Cluster such Hits (“classID” field in response) Where should I insert all this work into solr Request->Response Chain? RequestHandler? Thanks in advance, eks --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
