[
https://issues.apache.org/jira/browse/CONNECTORS-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138221#comment-13138221
]
Karl Wright commented on CONNECTORS-279:
----------------------------------------
Postgresql team responded and gave debugging tips, which show the following:
{code}
postgres=# select * from pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid
| classid | objid | objsubid | virtualtransaction | pid | mode | g
ranted
---------------+----------+----------+------+-------+------------+--------------
-+---------+-------+----------+--------------------+------+------------------+--
-------
relation | 133853 | 140809 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140809 | | | |
| | | | 2/204727 | 7260 | RowExclusiveLock | t
relation | 133853 | 140807 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140807 | | | |
| | | | 2/204727 | 7260 | RowExclusiveLock | t
virtualxid | | | | | 2/204727 |
| | | | 2/204727 | 7260 | ExclusiveLock | t
virtualxid | | | | | 3/185434 |
| | | | 3/185434 | 2384 | ExclusiveLock | t
virtualxid | | | | | 1/216105 |
| | | | 1/216105 | 1800 | ExclusiveLock | t
relation | 133853 | 140751 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140745 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140810 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140810 | | | |
| | | | 2/204727 | 7260 | RowExclusiveLock | t
relation | 133853 | 140808 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140808 | | | |
| | | | 2/204727 | 7260 | RowExclusiveLock | t
relation | 133853 | 140812 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140812 | | | |
| | | | 2/204727 | 7260 | RowExclusiveLock | t
relation | 133853 | 140784 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140785 | | | |
| | | | 1/216105 | 1800 | ShareLock | f
relation | 133853 | 140791 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140791 | | | |
| | | | 2/204727 | 7260 | RowExclusiveLock | t
relation | 133853 | 140785 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140785 | | | |
| | | | 2/204727 | 7260 | RowExclusiveLock | t
transactionid | | | | | | 6252988
| | | | 2/204727 | 7260 | ExclusiveLock | t
relation | 133853 | 140811 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140811 | | | |
| | | | 2/204727 | 7260 | RowExclusiveLock | t
relation | 133853 | 140813 | | | |
| | | | 2/204727 | 7260 | AccessShareLock | t
relation | 133853 | 140813 | | | |
| | | | 2/204727 | 7260 | RowExclusiveLock | t
relation | 11564 | 10969 | | | |
| | | | 3/185434 | 2384 | AccessShareLock | t
(27 rows)
postgres=# select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | current_query
| waiting | xact_start | query_start |
backend_start | client_addr | client_port
--------+----------+---------+----------+----------+----------------------------
-----+---------+----------------------------+----------------------------+------
----------------------+-------------+-------------
133853 | testdb | 1800 | 68231 | testuser | REINDEX TABLE jobqueue
| t | 2011-10-28 06:53:59.562-04 | 2011-10-28 06:53:59.562-04 | 2011-
10-28 06:53:57.789-04 | 127.0.0.1 | 54835
133853 | testdb | 7260 | 68231 | testuser | <IDLE> in transaction
| f | 2011-10-28 06:53:33.481-04 | 2011-10-28 06:53:34.895-04 | 2011-
10-28 06:53:33.404-04 | 127.0.0.1 | 54828
11564 | postgres | 2384 | 10 | postgres | select * from pg_stat_activ
ity; | f | 2011-10-28 06:57:14.212-04 | 2011-10-28 06:57:14.212-04 | 2011-
10-28 06:55:12.74-04 | ::1 | 54852
133853 | testdb | 7664 | 68231 | testuser | <IDLE>
| f | | 2011-10-28 06:57:11.029-04 | 2011-
10-28 06:57:11.017-04 | 127.0.0.1 | 54876
133853 | testdb | 3128 | 68231 | testuser | <IDLE>
| f | | 2011-10-28 06:57:11.059-04 | 2011-
10-28 06:57:11.052-04 | 127.0.0.1 | 54877
(5 rows)
"Thread-4398639" daemon prio=6 tid=0x087f9400 nid=0x9f0 runnable [0x04b4f000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at
org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:135)
at
org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:104)
at
org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73)
at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:255)
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1165)
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:191)
- locked <0x260f2220> (a org.postgresql.core.v3.QueryExecutorImpl)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:337)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:329)
at
org.apache.manifoldcf.core.database.Database.execute(Database.java:566)
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:421)
"Document delete thread '2'" daemon prio=6 tid=0x05660c00 nid=0xc88 in
Object.wait() [0x06d8f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1143)
- locked <0x260f0118> (a
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
at java.lang.Thread.join(Thread.java:1196)
at
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:453)
at
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:505)
at
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1152)
at
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
at
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:168)
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:639)
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reindexTableInternal(DBInterfacePostgreSQL.java:1284)
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.noteModifications(DBInterfacePostgreSQL.java:1356)
at
org.apache.manifoldcf.core.database.BaseTable.noteModifications(BaseTable.java:299)
at
org.apache.manifoldcf.crawler.jobs.JobQueue.deleteIngestedDocumentIdentifiers(JobQueue.java:565)
at
org.apache.manifoldcf.crawler.jobs.JobManager.deleteIngestedDocumentIdentifiers(JobManager.java:789)
at
org.apache.manifoldcf.crawler.system.DocumentDeleteThread.run(DocumentDeleteThread.java:176)
"Document delete stuffer thread" daemon prio=6 tid=0x0565fc00 nid=0x16b0 in
Object.wait() [0x06c9f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at
org.apache.manifoldcf.core.lockmanager.LockObject.enterWriteLock(LockObject.java:111)
- locked <0x260f26f8> (a
org.apache.manifoldcf.core.lockmanager.LockObject)
at
org.apache.manifoldcf.core.lockmanager.LockManager.enterWriteCriticalSection(LockManager.java:1459)
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.noteModifications(DBInterfacePostgreSQL.java:1322)
at
org.apache.manifoldcf.core.database.BaseTable.noteModifications(BaseTable.java:299)
at
org.apache.manifoldcf.crawler.jobs.JobQueue.setDeletingStatus(JobQueue.java:718)
at
org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1227)
at
org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:105)
{code}
So it looks like our reindexing locks are what are hanging the system - at
least in this case.
> Postgresql load test job delete document cleanup fails sometimes and leaves
> orphaned documents
> ----------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-279
> URL: https://issues.apache.org/jira/browse/CONNECTORS-279
> Project: ManifoldCF
> Issue Type: Bug
> Components: Framework crawler agent
> Affects Versions: ManifoldCF 0.4
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 0.4
>
>
> Running the postgresql load test on my laptop, I was surprised when the test
> did not finish. The UI indicated that the job was being deleted, but there
> were 49,000 documents and that number was not moving. Further inspection
> yielded the following:
> - Job was in the "DELETING" state
> - Documents were in the "BEINGDELETED" state
> - No activity of any kind ongoing
> The log had no errors.
> It was impossible to get a thread dump, but a cursory inspection of the code
> indicated that either the documents were being marked as "BEINGDELETED" but
> were not actually being placed on the in-memory queue, or the delete threads
> were picking up the documents and somehow avoiding marking them as being
> processed.
> Also, probably unrelated, the Document Status report listed these documents
> as having a status of "Being removed" and a state of "Unknown". The
> "Unknown" should have been a "Deleting". Since the extended WHEN... ELSE
> clause has a reasonable condition for the "Deleting" answer, it's hard to see
> how this could have occurred either.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira