I don't think LCF *can* necessarily communicate enough information for a 
downstream handler to make optimal smart decisions about committing, because 
effectively that would require LCF to predict the future.  For example, if you 
*knew* that a job was coming to an end shortly, you might delay a commit until 
that happened - but such certainty requires abilities beyond mere software.

My concern with this feature is it will go in either one of three ways:


(1)    Nobody will use it at all, but will instead configure Solr 
appropriately, as the initial design intended.

(2)    People will use it, but will never be satisfied with the amount of info 
that LCF sends downstream for decision making - they'll always want more.  For 
example, they'll start to want a notification after every X documents have been 
processed by a job.  Then, they'll want a notification after a continuous job 
has been idle for more than Y seconds.  Etc.  And in the end, the final results 
will *still* not be adequate for everyone's needs, because you're still trying 
to predict the future, and that's impossible.

(3)    People will just use the feature in the dumbest possible way: causing a 
commit on every job end, for example, and avoiding the lack of a job end on 
continuous jobs by never using continuous jobs.

I am also getting very concerned that so many "requirements" seem to be coming 
from "initial evaluation of LCF".  That sounds to me like features that won't 
really help anyone in the long run.

Karl


From: ext Jack Krupansky [mailto:[email protected]]
Sent: Wednesday, June 02, 2010 10:50 AM
To: [email protected]
Subject: Re: Setting up Solr -- commit, event notifications

Yes, a sophisticated app with lots of complex jobs will have to be quite smart 
about how it decides to commit. The goal for LCF would be simply to supply 
enough job status so that such a sophisticated app could decide that the job 
status warrants a commit. As I suggested, the simplest case would be to see 
that all non-continuous jobs (at least those that the app cares about) have 
completed.

The app end might or might not be Solr itself. It could indeed be a plug-in for 
Solr, or just some other app process that has the specified context handler.

And, yes, the "commit at end of job" option is not terribly useful for complex, 
overlapping job arrangements. It's primary use case is for initial evaluation 
of LCF. But it might be sufficient for some simpler apps. Not all Solr apps are 
horribly complicated.

Maybe the option should technically be spec'ed as "commit at end of job, but 
only if no other jobs are active with the Solr output connector".

In some cases you might only want to commit when a specific job completes. For 
example, maybe a series of jobs are scheduled to run in sequence and the commit 
is only desired on completion of the final job in that sequence. In that case, 
the option is desired at the job level rather than for the Solr output 
connection itself. Is there any provision for job-specific output connector 
options?

-- Jack Krupansky

From: [email protected]<mailto:[email protected]>
Sent: Wednesday, June 02, 2010 10:19 AM
To: 
[email protected]<mailto:[email protected]>
Subject: RE: Setting up Solr

What about job deletion document cleanup, etc?  Overlapping job runs using the 
same output connection?  We've had this discussion before; the connector can 
certainly have hooks added but unless you intend to construct some kind of data 
structure on the Solr end that tries to keep track of all that, you're likely 
not going to get quite what you are looking for.

Karl


From: ext Jack Krupansky [mailto:[email protected]]
Sent: Wednesday, June 02, 2010 10:15 AM
To: [email protected]
Subject: Re: Setting up Solr

It would be nice to have a "commit at end of job" option for the Solr output 
connector. Granted, commit policy can be a lot more complicated than that, but 
it is a simple use case that would facilitate initial evaluations of LCF with 
Solr.

Thinking further ahead, it would be very useful to have "job status 
notification" messages that could be sent to an app (say, a 
"/update/lcf-job-status" request handler) that would note start, end, abort, 
and periodic status of LCF jobs. Then the app could commit as it desires with 
respect to individual job completion and larger collections of jobs for 
different repositories. For example, an app might wait for all non-continuous 
jobs to complete before committing. That would be a more comprehensive 
longer-term solution for the commit problem, but the simple end-of-job commit 
option would be more user-friendly in the near-term.

-- Jack Krupansky

From: [email protected]<mailto:[email protected]>
Sent: Wednesday, June 02, 2010 9:09 AM
To: 
[email protected]<mailto:[email protected]>
Subject: RE: Setting up Solr

Solr has autocommit functionality built in.  Google for it and you will find 
out how to configure it.

Karl

From: ext [email protected]<mailto:[email protected]> 
[mailto:[email protected]]
Sent: Wednesday, June 02, 2010 9:08 AM
To: 
[email protected]<mailto:[email protected]>
Subject: RE: Setting up Solr

Why can we have a job for this ? else is there any other way ?? (Windows ? in 
linux there are cron jobs )

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
[email protected]<mailto:[email protected]>

From: [email protected] [mailto:[email protected]]
Sent: Wednesday, June 02, 2010 6:32 PM
To: [email protected]
Subject: RE: Setting up Solr

You can send any argument you want by configuring the output connector.  
However, the explicit commit on every post will slow down performance of your 
crawls.

Karl

From: ext [email protected] [mailto:[email protected]]
Sent: Wednesday, June 02, 2010 9:00 AM
To: [email protected]
Subject: RE: Setting up Solr

Hi,

Yes that is where I was stuck up.. making an explicit commit..

Can I send the argument commit=true while configuring the Repo connector.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
[email protected]<mailto:[email protected]>

From: Jack Krupansky [mailto:[email protected]]
Sent: Wednesday, June 02, 2010 4:42 PM
To: [email protected]
Subject: Re: Setting up Solr

A short Solr tutorial is here:

http://lucene.apache.org/solr/tutorial.html
After running an LCF job that uses a Solr output connection, be sure to 
manually force a Solr "commit", for example:

    cd .../apache-solr-1.4.0/example/exampledocs
    java -jar post.jar

-- Jack Krupansky

From: [email protected]<mailto:[email protected]>
Sent: Wednesday, June 02, 2010 1:46 AM
To: 
[email protected]<mailto:[email protected]>
Subject: Setting up Solr

Hi,

I am stuck at setting up the Solr server to be used with LCF.

I am new to Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
[email protected]<mailto:[email protected]>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.


This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.


This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.


Reply via email to