LCF already has throttling to a maximum number of an specific output connection 
instance.  So while there's no provision for limiting the speed at which data 
gets thrown to Solr on each connection, there's a limit to how many connections 
there are at any given time.

Hopefully this is sufficient.

Karl


-----Original Message-----
From: ext Erik Hatcher [mailto:[email protected]] 
Sent: Wednesday, June 02, 2010 11:21 AM
To: [email protected]
Subject: Re: Setting up Solr -- commit

autocommit is really the right answer here for the discussions going  
on today.  When there are multiple streams of incoming documents to  
Solr, unless you want to build some kind of coordinated system that'll  
control commits, simply use autocommit.  Definitely a commit-per-doc  
is not recommended, and highly discouraged.

As for indexing - it really is the more the merrier, to a point.   
Server RAM is needed to handle incoming requests, and these rich  
documents are typically large'ish.  Throttling so as to not add too  
many (how many is that?  gotta test with your system and RAM and  
solrconfig.xml settings) docs at a time is going to be needed in some  
way.

        Erik


On Jun 2, 2010, at 11:15 AM, Jack Krupansky wrote:

> I did in fact try setting commit in the Solr output connection  
> arguments a month ago. It kind of worked, but Solr gave some errors  
> on occasion due to overlapping requests - one request did a commit  
> while other parallel requests from LCF were in various stages of  
> processing. I do not recall whether I tried to set JVM throttling to  
> 1 to force sequential processing of posted documents, but you don't  
> really want to have to force sequential processing anyway.
>
> Side note to Solr guys: What is the "contract" for the  
> ExtractingRequestHandler in terms of handling parallel requests? Is  
> it "the more the merrier" (including lots of PDF files?), or are  
> there specific issues that the client must/should worry about? There  
> is also the potential for multiple clients, LCF or other,  
> simultaneously blasting at /update/extract. Obviously those clients  
> can't know what each other is up to.
>
> -- Jack Krupansky
>
> From: [email protected]
> Sent: Wednesday, June 02, 2010 9:01 AM
> To: [email protected]
> Subject: RE: Setting up Solr
>
> You can send any argument you want by configuring the output  
> connector.  However, the explicit commit on every post will slow  
> down performance of your crawls.
>
> Karl
>
> From: ext [email protected] [mailto:[email protected] 
> ]
> Sent: Wednesday, June 02, 2010 9:00 AM
> To: [email protected]
> Subject: RE: Setting up Solr
>
> Hi,
>
> Yes that is where I was stuck up.. making an explicit commit..
>
> Can I send the argument commit=true while configuring the Repo  
> connector.
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> [email protected]
>
> From: Jack Krupansky [mailto:[email protected]]
> Sent: Wednesday, June 02, 2010 4:42 PM
> To: [email protected]
> Subject: Re: Setting up Solr
>
> A short Solr tutorial is here:
>
> http://lucene.apache.org/solr/tutorial.html
> After running an LCF job that uses a Solr output connection, be sure  
> to manually force a Solr "commit", for example:
>
>     cd .../apache-solr-1.4.0/example/exampledocs
>     java -jar post.jar
>
> -- Jack Krupansky
>
> From: [email protected]
> Sent: Wednesday, June 02, 2010 1:46 AM
> To: [email protected]
> Subject: Setting up Solr
>
> Hi,
>
> I am stuck at setting up the Solr server to be used with LCF.
>
> I am new to Solr.
>
> Thanks & Regards,
> Rohan G Patil
> Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91  
> 9535577001
> [email protected]
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on  
> this
> e-mail is strictly prohibited and may be unlawful.
>
>
>

Reply via email to