Synchronize large number of records with Solr

2007-09-14 Thread climbingrose
Hi all, I've been struggling to find a good way to synchronize Solr with a large number of records. We collect our data from a number of sources and each source produces around 50,000 docs. Each of these document has a sourceId field indicating the source of the document. Now assuming we're

SolrSchema Fields Dynamically

2007-09-14 Thread Venkatraman S
Hi, I am new to Solr and am t -- Blog @ http://blizzardzblogs.blogspot.com

SolrSchema Fields Dynamically

2007-09-14 Thread Venkatraman S
Hi, I am new to Solr and am trying to implementing a solution for indexing and searching using Embedded Solr. However, i have a query w.r.t SolrSchema : How do i generate the schema fields programatically, instead of defining them in the schema.xml ? Regards, Venkat [apologies for sending a WIP

Re: Synchronize large number of records with Solr

2007-09-14 Thread Erik Hatcher
Cuong, I accomplished (in Collex) by attaching a batch number to each document. When indexing a batch (or source), a GUID is generated and every document from that batch/source gets that same identifier attached to it. At the end of the indexing run, I delete everything with that

Batch indexing a large number of records

2007-09-14 Thread Thompson,Roger
Hi there! I am embarking on re-engineering an application using Solr/Lucene (If you'd like to see the current manifestation go to: fictionfinder.oclc.org). The database for this application consists of approximatly 1.4 million records of varying size for the work record, and another database of

Re: Synchronize large number of records with Solr

2007-09-14 Thread climbingrose
Hi Erik, So in your case #1, documents are reindexed with this scheme - so if you truly need to skip a reindexing for some reason (why, though?) you'll need to come up with some other mechanism. [perhaps update could be enhanced to allow ignoring a duplicate id rather than reindexing?] It's

Authentication

2007-09-14 Thread jenix
Hi, What methods are available for user authentication? I'm using Jetty and php/curl and Basic HTTP Auth does not seem to work. I just need something simple so that only the Admin can add, update or delete documents. Regards, Jennifer Seaman -- View this message in context:

Re: Authentication

2007-09-14 Thread Bill Au
Add/Update, Commit/Optimize, Delete, and Delete by query, in Solr are done using the url /update. So should be able to protect that url at the container level outside of Solr. If you want you can protect the query url /select or the admin pages too. Container level authentication is transparent

Re: Authentication

2007-09-14 Thread jenix
When you say outside of Solr do you mean outside of solr.war? We finally got php/curl working with jetty's Basic Authentication. We had to unpack and repack solr.war to edit web.xml and it would have been nice to use some other method. -- View this message in context:

Re: Query for German Special Characters (i.e., ä, ö, ß)

2007-09-14 Thread Marc Bechler
Hi, oops, the URIEncoding was lost during the update to tomcat 6.0.14. Thanks for the advice. But now I am really curioused. After indexing the document from scratch, I have the effect that queries to this and is work fine, whereas queries to really and fünny do not return the result. Fünnily

hl.snippets per field overide

2007-09-14 Thread Nathaniel E. Powell
In the wiki: http://wiki.apache.org/solr/HighlightingParameters#head-23ecd5061bc2c86a 561f85dc1303979fe614b956 where it talks about the hl.snippets parameter, it says that it can be overridden on a per-field basis. I haven't been able to find any information in the documentation or on the

Re: hl.snippets per field overide

2007-09-14 Thread Erik Hatcher
On Sep 14, 2007, at 12:33 PM, Nathaniel E. Powell wrote: http://wiki.apache.org/solr/ HighlightingParameters#head-23ecd5061bc2c86a 561f85dc1303979fe614b956 where it talks about the hl.snippets parameter, it says that it can be overridden on a per-field basis. I haven't been able to find any

Re: Batch indexing a large number of records

2007-09-14 Thread Erik Hatcher
On Sep 14, 2007, at 8:19 AM, Thompson,Roger wrote: I am embarking on re-engineering an application using Solr/Lucene (If you'd like to see the current manifestation go to: fictionfinder.oclc.org). The database for this application consists of approximatly 1.4 million records of varying size

Re: Batch indexing a large number of records

2007-09-14 Thread Mike Klaas
On 14-Sep-07, at 5:19 AM, Thompson,Roger wrote: Hi there! I am embarking on re-engineering an application using Solr/Lucene (If you'd like to see the current manifestation go to: fictionfinder.oclc.org). The database for this application consists of approximatly 1.4 million records of

Re: Query for German Special Characters (i.e., ä, ö, ß)

2007-09-14 Thread Tom Hill
Hi Marc, Are you using the same stemmer on your queries that you use when indexing? Try the analysis function in the admin UI, to see how things are stemmed for indexing vs. querying. If they don't match for really and fünny, and do match for kraßen, then that's your problem. Tom On 9/14/07,

Re: Query for German Special Characters (i.e., ä, ö, ß)

2007-09-14 Thread Marc Bechler
Hi Tom, thanks for your response -- and sorry for the newbie question, may sound somehow silly ;-) . Here the quick result of the analysis UI: Index for really: 5* really. Query for really: 5* really, 2* realli (from: EnglishPorterFilterFactory {protected=protwords.txt},

Re: Query for German Special Characters (i.e., ä, ö, ß)

2007-09-14 Thread Tom Hill
Hi Marc, The searches are going to look for an exact match of the query (after analysis) in the index (after analysis). So, realli will not match really. So you want to have the same stemmer (probably not the English one, given your examples) in both in index analyzer, and the query analyzer.

Re: Authentication

2007-09-14 Thread Bill Au
I meant outside of the Solr code. You are right that it is still in the Solr war file since you will need to put the authentication configuration into web.xml. Bill On 9/14/07, jenix [EMAIL PROTECTED] wrote: When you say outside of Solr do you mean outside of solr.war? We finally got

Re: Query for German Special Characters (i.e., ä, ö, ß)

2007-09-14 Thread Marc Bechler
Hi Tom, thanks for your professional response -- works fine and looks good :-). Since I am playing around with mixed texts (English and German), I do not have any idea whether or not an EnglishPorter will be useful for German texts. But I will find it out by playing around ;-) Regards from

RE: Authentication

2007-09-14 Thread Lance Norskog
You can try the public/private key certficate system. You deploy it to jetty/tomcat somehow, and curl has options to send it. We haven't tried this. The authentication happens at the http container level, not in the solr config. -Original Message- From: Bill Au [mailto:[EMAIL PROTECTED]

Re: Query for German Special Characters (i.e., ä, ö, ß)

2007-09-14 Thread Walter Underwood
You could use index into multiple fields with different analyzers and search all of them. text_en: uses English stemmer text_de: uses German stemmer text_exact: no stemming text_strip: uses ISOLatin1AccentFilter You can search all of these and put different boosts on them, with higher boosts for

RE: hl.snippets per field overide

2007-09-14 Thread Nathaniel E. Powell
I apologize for missing that. I added an anchor at the top and a link where the word overrides is in the wiki. Thanks, -Nathan -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Friday, September 14, 2007 10:53 AM To: solr-user@lucene.apache.org Subject: Re:

Re: Authentication

2007-09-14 Thread Chris Hostetter
: When you say outside of Solr do you mean outside of solr.war? We finally : got php/curl working with jetty's Basic Authentication. We had to unpack and : repack solr.war to edit web.xml and it would have been nice to use some : other method. it should not be neccessary to unpack the war ... you

Re: Slow response

2007-09-14 Thread Tom Hill
Hi Mike, Thanks for clarifying what has been a bit of a black box to me. A couple of questions, to increase my understanding, if you don't mind. If I am only using fields with multiValued=false, with a type of string or integer (untokenized), does solr automatically use approach 2? Or is this

Re: Slow response

2007-09-14 Thread Mike Klaas
On 14-Sep-07, at 3:38 PM, Tom Hill wrote: Hi Mike, Thanks for clarifying what has been a bit of a black box to me. A couple of questions, to increase my understanding, if you don't mind. If I am only using fields with multiValued=false, with a type of string or integer (untokenized),

Triggering snapshooter through web admin interface

2007-09-14 Thread Wu, Daniel
Hi, I am not sure if this can be done. Let's say if periodically there is a big batch to be indexed and we don't want to replicate the data befor the batch is completely indexed. We would like to avoid post commit hook as we will be periodically committing to reduce the memory usage and we

Re: Synchronize large number of records with Solr

2007-09-14 Thread Chris Hostetter
: number of records. We collect our data from a number of sources and each : source produces around 50,000 docs. Each of these document has a sourceId : field indicating the source of the document. Now assuming we're indexing all : documents from SourceA (sourceId=SourceA), majority of these docs

Re: Synchronize large number of records with Solr

2007-09-14 Thread Walter Underwood
You could MD4 the parts you care about, store that, fetch it and compare. If there is a reliable timestamp, you could use that. But that would be app-dependent. In general, you need to store some info about each source document and figure out whether it is new. This get much hairier with a web