[jira] [Commented] (CONNECTORS-764) Hopcount logic fails to notice when the max number of hops is increased between crawls

Karl Wright (JIRA) Wed, 14 Aug 2013 02:44:48 -0700

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739451#comment-13739451
 ]


Karl Wright commented on CONNECTORS-764:
----------------------------------------

Hmm, my tests are indicating a problem:

{code}
ERROR 2013-08-14 05:38:48,455 (Startup thread) - Startup thread aborting and 
restarting due to database connection reset: Database exception: SQLException 
doing query (23503): ERROR: update or delete on table "jobqueue" violates 
foreign key constraint "prereqevents_owner_fkey" on table "prereqevents"
  Detail: Key (id)=(1376473092294) is still referenced from table 
"prereqevents".
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: 
SQLException doing query (23503): ERROR: update or delete on table "jobqueue" 
violates foreign key constraint "prereqevents_owner_fkey" on table 
"prereqevents"
  Detail: Key (id)=(1376473092294) is still referenced from table 
"prereqevents".
        at 
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:717)
        at 
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:745)
        at 
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1430)
        at 
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
        at 
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
        at 
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:646)
        at 
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performDelete(DBInterfacePostgreSQL.java:280)
        at 
org.apache.manifoldcf.core.database.BaseTable.performDelete(BaseTable.java:91)
        at 
org.apache.manifoldcf.crawler.jobs.JobQueue.prepareFullScan(JobQueue.java:577)
        at 
org.apache.manifoldcf.crawler.jobs.JobManager.prepareFullScan(JobManager.java:5592)
        at 
org.apache.manifoldcf.crawler.jobs.JobManager.prepareJobScan(JobManager.java:5506)
        at 
org.apache.manifoldcf.crawler.system.StartupThread.run(StartupThread.java:142)
Caused by: org.postgresql.util.PSQLException: ERROR: update or delete on table 
"jobqueue" violates foreign key constraint "prereqevents_owner_fkey" on table 
"prereqevents"
  Detail: Key (id)=(1376473092294) is still referenced from table 
"prereqevents".
        at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
        at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
        at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
        at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
        at 
org.apache.manifoldcf.core.database.Database.execute(Database.java:876)
        at 
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:677)
{code}

So I think there's another problem buried here as well.  Digging now...

                
> Hopcount logic fails to notice when the max number of hops is increased 
> between crawls
> --------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-764
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-764
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.3
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.4
>
>
> When you do something like the following:
> (1) Set the max hops for a job relatively low
> (2) Crawl
> (3) Increase the max hops
> (4) Crawl again
> ... the documents that are labeled with the state "Hop count exceeded" at the 
> end of the first crawl are never touched again.  This is because there are no 
> additional links added to the intrinsiclink table during the second crawl, 
> and thus the method reactivateHopcountRemovedRecords() is never called, 
> leaving the documents in an incorrect state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CONNECTORS-764) Hopcount logic fails to notice when the max number of hops is increased between crawls

Reply via email to