Hi Kate, I ran a job based on the same feed twice. Here are the results, from the simple history:
Start Time Activity Identifier Result Code Bytes Time Result Description 08-16-2011 20:38:10.924 job end 1313541280969(jazz) 0 1 08-16-2011 20:37:57.179 document ingest (solr) http://www.onemansjazz.ca/content/view/331/30/ 200 16980 18 08-16-2011 20:37:56.241 fetch http://www.onemansjazz.ca/content/view/331/30/ 200 16980 905 08-16-2011 20:37:52.117 document ingest (solr) http://www.onemansjazz.ca/content/view/334/30/ 200 16718 15 08-16-2011 20:37:51.241 fetch http://www.onemansjazz.ca/content/view/334/30/ 200 16718 839 08-16-2011 20:37:47.292 document ingest (solr) http://www.onemansjazz.ca/content/view/330/50/ 200 22605 19 08-16-2011 20:37:46.241 fetch http://www.onemansjazz.ca/content/view/330/50/ 200 22605 1003 08-16-2011 20:37:42.149 document ingest (solr) http://www.onemansjazz.ca/content/view/333/30/ 200 17606 19 08-16-2011 20:37:41.241 fetch http://www.onemansjazz.ca/content/view/333/30/ 200 17606 887 08-16-2011 20:37:37.165 document ingest (solr) http://www.onemansjazz.ca/content/view/332/30/ 200 17083 20 08-16-2011 20:37:36.241 fetch http://www.onemansjazz.ca/content/view/332/30/ 200 17083 898 08-16-2011 20:37:32.783 document ingest (solr) http://www.onemansjazz.ca/content/view/336/30/ 200 17473 19 08-16-2011 20:37:31.241 fetch http://www.onemansjazz.ca/content/view/336/30/ 200 17473 922 08-16-2011 20:37:27.191 document ingest (solr) http://www.onemansjazz.ca/content/view/329/30/ 200 17105 52 08-16-2011 20:37:26.241 fetch http://www.onemansjazz.ca/content/view/329/30/ 200 17105 912 08-16-2011 20:37:21.241 fetch http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.... 0/no_html,1/ 200 3973 542 08-16-2011 20:37:20.970 job start 1313541280969(jazz) 0 1 08-16-2011 20:37:00.893 job end 1313541280969(jazz) 0 1 08-16-2011 20:36:49.123 document ingest (solr) http://www.onemansjazz.ca/content/view/334/30/ 200 16718 17 08-16-2011 20:36:48.076 fetch http://www.onemansjazz.ca/content/view/334/30/ 200 16718 1028 08-16-2011 20:36:44.305 document ingest (solr) http://www.onemansjazz.ca/content/view/332/30/ 200 17083 34 08-16-2011 20:36:43.076 fetch http://www.onemansjazz.ca/content/view/332/30/ 200 17083 1208 08-16-2011 20:36:39.175 document ingest (solr) http://www.onemansjazz.ca/content/view/336/30/ 200 17473 23 08-16-2011 20:36:38.076 fetch http://www.onemansjazz.ca/content/view/336/30/ 200 17473 1087 08-16-2011 20:36:33.983 document ingest (solr) http://www.onemansjazz.ca/content/view/331/30/ 200 16980 24 08-16-2011 20:36:33.076 fetch http://www.onemansjazz.ca/content/view/331/30/ 200 16980 896 08-16-2011 20:36:29.297 document ingest (solr) http://www.onemansjazz.ca/content/view/329/30/ 200 17105 24 08-16-2011 20:36:28.774 document ingest (solr) http://www.onemansjazz.ca/content/view/330/50/ 200 22605 35 08-16-2011 20:36:28.076 fetch http://www.onemansjazz.ca/content/view/329/30/ 200 17105 1204 08-16-2011 20:36:23.076 fetch http://www.onemansjazz.ca/content/view/330/50/ 200 22605 5679 08-16-2011 20:36:21.130 document ingest (solr) http://www.onemansjazz.ca/content/view/333/30/ 200 17606 418 08-16-2011 20:36:18.076 fetch http://www.onemansjazz.ca/content/view/333/30/ 200 17606 2969 08-16-2011 20:36:13.094 fetch http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.... 0/no_html,1/ 200 3973 1945 08-16-2011 20:36:10.870 job start 1313541280969(jazz) 0 1 Note that on each run, the size of each document being indexed changes. This is likely due to "chrome" (advertisements, etc.) which are dynamically delivered by the site in a random way. The RSS connector will, of course, not be able to recognize that the content you are interested in hasn't changed, because as far as it can tell it *has*. This is very different from the case where you are use the "dechromed" content based on the "description" field, because it is the actual feed description field that is indexed, not the document contents, and therefore no chrome will be present. Thus you are more likely to see repeated runs of a job index nothing if the job has a "dechromed" content mode set. Karl On Tue, Aug 16, 2011 at 5:07 PM, K McGonigal <kmcgon...@gmail.com> wrote: > Hmm. I will keep this in mind, but I'm confused again. I just ran this job > twice in a row and pretty much the same thing was sent to Solr. The same > number of items (7) were "add"ed. I think they were the same items, just in > a different order. The second run also deleted an item from Solr that was > not in the RSS document. I'm pretty sure the RSS feed document or the > linked documents did not change. > > A snippet from the first run: > > INFO: {add=[http://www.onemansjazz.ca/content/view/330/50/]} 0 16 >> 16-Aug-2011 3:18:11 PM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr path=/update/extract params={literal.source= >> http://www.one >> >> mansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/&literal.category=New >> >> s+-+General&literal.summary=I+have+created+a+Listener+Survey+and+if+you+have+the >> >> +time+to+complete+it,+that+would+be+terrific.++I%26#39;m+trying+to+do+an+evaluat >> >> ion+of+One+Man%26#39;s+Jazz+as+well+as+considering+some+new+options+that+have+ar >> >> isen.++Your+feedback+would+be+most+appreciate.This+survey+is+in+two+parts+and+is >> >> +a+total+of+twenty+parts,+most+of+them+just+require+a+click+of+your+mouse.++Clic >> k+here+( >> http://www.surveymonkey.com/s/C3DZ3JK)++for+Part+One,+and+here+(http://w<http://www.surveymonkey.com/s/C3DZ3JK%29++for+Part+One,+and+here+%28http://w> >> >> ww.surveymonkey.com/s/C38FVH8)++for+Part+Two.+++Thanks+again+for+your+input.+&li<http://ww.surveymonkey.com/s/C38FVH8%29++for+Part+Two.+++Thanks+again+for+your+input.+&li> >> teral.id= >> http://www.onemansjazz.ca/content/view/330/50/&literal.title=Listener+S >> urvey&literal.pubdate=1310475289000} status=0 QTime=16 >> 16-Aug-2011 3:18:13 PM org.apache.solr.update.processor.LogUpdateProcessor >> finis >> h >> > > A snippet from the second run: > > INFO: {add=[http://www.onemansjazz.ca/content/view/330/50/]} 0 15 >> 16-Aug-2011 3:27:55 PM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr path=/update/extract params={literal.source= >> http://www.one >> >> mansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/&literal.category=New >> >> s+-+General&literal.summary=I+have+created+a+Listener+Survey+and+if+you+have+the >> >> +time+to+complete+it,+that+would+be+terrific.++I%26#39;m+trying+to+do+an+evaluat >> >> ion+of+One+Man%26#39;s+Jazz+as+well+as+considering+some+new+options+that+have+ar >> >> isen.++Your+feedback+would+be+most+appreciate.This+survey+is+in+two+parts+and+is >> >> +a+total+of+twenty+parts,+most+of+them+just+require+a+click+of+your+mouse.++Clic >> k+here+( >> http://www.surveymonkey.com/s/C3DZ3JK)++for+Part+One,+and+here+(http://w<http://www.surveymonkey.com/s/C3DZ3JK%29++for+Part+One,+and+here+%28http://w> >> >> ww.surveymonkey.com/s/C38FVH8)++for+Part+Two.+++Thanks+again+for+your+input.+&li<http://ww.surveymonkey.com/s/C38FVH8%29++for+Part+Two.+++Thanks+again+for+your+input.+&li> >> teral.id= >> http://www.onemansjazz.ca/content/view/330/50/&literal.title=Listener+S >> urvey&literal.pubdate=1310475289000} status=0 QTime=15 >> 16-Aug-2011 3:28:00 PM org.apache.solr.update.processor.LogUpdateProcessor >> finis >> h >> > > I think they are identical. > > > View a Job >> ------------------------------ >> Name:OMJ >> ------------------------------ >> Output connection: Solr Repository connection: RSS >> ------------------------------ >> Priority:5 Start method:Don't automatically start >> ------------------------------ >> Schedule type:Scan every document once Minimum recrawl interval:Not >> applicable Expiration interval:Not applicable Reseed interval:Not >> applicable >> ------------------------------ >> No scheduled run times >> ------------------------------ >> Field mappings: Metadata field name Solr field name No field mapping >> specified >> ------------------------------ >> RSS urls: >> http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/ >> ------------------------------ >> No url canonicalization specified; will reorder all urls and remove all >> sessions >> ------------------------------ >> No mappings specified; will accept all urls >> ------------------------------ >> Feed connection timeout (seconds): 60 Default feed rescan interval >> (minutes): 60 Minimum feed rescan interval (minutes): 15 Bad feed >> rescan interval (minutes): (Default feed rescan value) >> ------------------------------ >> Dechromed content source: none Chromed content: none >> ------------------------------ >> No access tokens specified >> ------------------------------ >> No metadata specified > > > > View Repository Connection Status > ------------------------------ > Name:RSS Description: > ------------------------------ > Connection type:RSS Max connections:10 Authority:None (global authority) > ------------------------------ > Throttling: Bin regular expression Description Max avg fetches/min No > throttles > ------------------------------ > Parameters: Proxy port= > Proxy authentication password=******** > Max server connections=2 > Proxy host= > KB per second=64 > Robots usage=none > Proxy authentication user name= > Max fetches per minute=12 > Email address=kmcgon...@gmail.com > Proxy authentication domain= > Throttle group= > ------------------------------ > Connection status:Connection working > >