I believe svn 180008 should correct this problem, but I was never able
to exactly reproduce your error - I didn't make it to a linux machine.
I believe the problem was a timing related error where the system
was sometimes incorrectly reading past end of file. On my machine
I would see an EOF type error every 5-10 times I ran the test. After
the change I ran it 100 times with no error.
By the way just for reference how fast are processors on your machine,
and how many processors if is multi-cpu? It seems the error I fixed
would show up more easily on fast, multi-cpu machines.
Mike Matrigali wrote:
I will look into this, see if I can reproduce locally. Note that
lock timeouts in the derby.log are expected part of a successful run -
but the other 2 are of course wrong.
Getting a few stack traces into derby log is a good 1st step to
diagnosing "hangs". Also knowing if the hanging process is looping or
waiting is good info. It is JVM dependent but on unix there is usually
a signal you can send the JVM which will cause it to dump stack
traces of all the threads in the system. Getting 2 or 3 can often tell
if it is hanging or looping.
/mikem
Ole Solberg wrote:
I have observed that, on 1-one of the platforms we're running
regression tests on, the store/OnlineCompressTest.java fails.
See e.g.
http://www.multinet.no/~solberg/public/Apache/Derby/index.html/storemore_history.html
(storeall and derbyall ditto).
The platform showing the problem is Linux-2.4.20-31.9 i686-i686
(RedHat 9).
There seems to be three different behaviours:
DATA_UNKNOWN_PAGE_FORMAT, LOCK_TIMEOUT, test never terminates.
1. changeInstanceTo(CachedPage.java:315) throws DATA_UNKNOWN_PAGE_FORMAT:
----------------------------------------------------------
Database Class Loader started - derby.database.classpath=''
2005-06-01 11:50:53.078 GMT Thread[main,5,main] (XID = 2027143),
(SESSIONID = 0), (DATABASE = wombat), (DRDAID = null), Cleanup action
starting
2005-06-01 11:50:53.079 GMT Thread[main,5,main] (XID = 2027143),
(SESSIONID = 0), (DATABASE = wombat), (DRDAID = null), Failed
Statement is: insert into TEST1 values(?, ?, ?, ?, ?, ?)
ERROR XSDB1: Unknown page format at page Page(14,Container(0, 1057))
at
org.apache.derby.iapi.error.StandardException.newException(StandardException.java:315)
at
org.apache.derby.impl.store.raw.data.CachedPage.changeInstanceTo(CachedPage.java:315)
at
org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(CachedPage.java:195)
at
org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(CachedItem.java:235)
at
org.apache.derby.impl.services.cache.Clock.addEntry(Clock.java:798)
2. lockObject(LockSet.java:414) throws LOCK_TIMEOUT:
----------------------------------------------------------
Database Class Loader started - derby.database.classpath=''
2005-05-31 07:16:50.336 GMT Thread[main,5,main] (XID = 156),
(SESSIONID = 0), (DATABASE = wombat), (DRDAID = null), Cleanup action
starting
2005-05-31 07:16:50.336 GMT Thread[main,5,main] (XID = 156),
(SESSIONID = 0), (DATABASE = wombat), (DRDAID = null), Failed
Statement is: call SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE(?, ?, ?, ?, ?)
ERROR 40XL1: A lock could not be obtained within the time requested
at
org.apache.derby.iapi.error.StandardException.newException(StandardException.java:301)
at
org.apache.derby.impl.services.locks.LockSet.lockObject(LockSet.java:414)
at
org.apache.derby.impl.services.locks.SinglePool.lockAnObject(SinglePool.java:174)
at
org.apache.derby.impl.services.locks.SinglePool.lockObject(SinglePool.java:226)
at
org.apache.derby.impl.store.raw.xact.ContainerLocking3.lockContainer(ContainerLocking3.java:63)
at
org.apache.derby.impl.store.raw.data.BaseContainerHandle.useContainer(BaseContainerHandle.java:761)
at
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(BaseDataFileFactory.java:548)
at
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(BaseDataFileFactory.java:494)
at
org.apache.derby.impl.store.raw.xact.Xact.openContainer(Xact.java:1270)
at
org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(OpenConglomerate.java:865)
3. Never terminates and have to kill the test! tail
OnlineCompressTest.tmp shows:
----------------------------------------------
Executing test: end checkPurgePhase50 row test.
Executing test: begin deleteAllRows,4000 row test, create = true.
Executing test: no delete case complete.
Executing test: delete all rows case succeeded.
Executing test: end deleteAllRows,4000 row test.
Executing test: begin deleteAllRows,4000 row test, create = false.
Executing test: no delete case complete.
Executing test: delete all rows case succeeded.
Executing test: end deleteAllRows,4000 row test.
Executing test: begin checkPurgePhase4000 row test, create = false.
I have been digging around in the code but must admit I haven't got
any closer to an explanation.
What is the best way to debug further?