On 06.02.14 15:25, Karl Wright wrote:

So I conclude that simple history is working fine, but since it is only
returning indexing results within the last hour by default it is confusing
you.  I also think it is likely that documents are getting skipped because
you've crawled this set before with the same job and many of the documents
have not changed.

Karl, we are indexing these documents:

I have tail -F opened up from our Solr test server at the moment:
[2014-02-06 15:21:00.321] INFO [uio] OP crawl {add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=B]} 0 38 [2014-02-06 15:21:00.359] INFO [uio] OP crawl {add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=N]} 0 23 [2014-02-06 15:21:29.732] INFO [uio] OP crawl {add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=G]} 0 29 [2014-02-06 15:22:11.954] INFO [uio] OP crawl {add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=S]} 0 38 [2014-02-06 15:22:15.752] INFO [uio] OP crawl {add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=D]} 0 28 [2014-02-06 15:22:18.323] INFO [uio] OP crawl {add=[http://www.ibsen.uio.no/brevmottakere.xhtml?bokstav=H]} 0 34 [2014-02-06 15:22:21.657] INFO [uio] OP crawl {add=[http://www.ibsen.uio.no/variakronologi.xhtml]} 0 73

How could these log entries show up on our Solr server if the documents were skipped?

And why did I get entries like this earlier today:
DEBUG 2014-02-06 10:28:06,609 (Worker thread '29') - WEB: Decided to ingest 'http://www.ibsen.uio.no/varia.xhtml'

(I have changed the log level back to INFO right now, so I cannot see these entries for the last crawl, but I will re-enable DEBUG again).

I have re-ingested all documents several times today to be sure that all documents were crawled all over again.

Of course, I can try to remove all jobs, delete all tables in PostgreSQL and try to create everything from scratch in case the old settings did not get upgraded successfully. Unfortunately MCF will delete all tables in my index as well.

Erlend

Reply via email to