I believe svn 180008 should correct this problem, but I was never able
to exactly reproduce your error - I didn't make it to a linux machine.
I believe the problem was a timing related error where the system
was sometimes incorrectly reading past end of file.  On my machine
I would see an EOF type error every 5-10 times I ran the test.  After
the change I ran it 100 times with no error.

By the way just for reference how fast are processors on your machine,
and how many processors if is multi-cpu?  It seems the error I fixed
would show up more easily on fast, multi-cpu machines.

Mike Matrigali wrote:
I will look into this, see if I can reproduce locally.  Note that
lock timeouts in the derby.log are expected part of a successful run -
but the other 2 are of course wrong.

Getting a few stack traces into derby log is a good 1st step to diagnosing "hangs". Also knowing if the hanging process is looping or
waiting is good info.  It is JVM dependent but on unix there is usually
a signal you can send the JVM which will cause it to dump stack
traces of all the threads in the system.  Getting 2 or 3 can often tell
if it is hanging or looping.

/mikem

Ole Solberg wrote:

I have observed that, on 1-one of the platforms we're running regression tests on, the store/OnlineCompressTest.java fails. See e.g. http://www.multinet.no/~solberg/public/Apache/Derby/index.html/storemore_history.html
(storeall and derbyall ditto).

The platform showing the problem is Linux-2.4.20-31.9 i686-i686 (RedHat 9).

There seems to be three different behaviours: DATA_UNKNOWN_PAGE_FORMAT, LOCK_TIMEOUT, test never terminates.


1. changeInstanceTo(CachedPage.java:315) throws DATA_UNKNOWN_PAGE_FORMAT:
----------------------------------------------------------
Database Class Loader started - derby.database.classpath=''
2005-06-01 11:50:53.078 GMT Thread[main,5,main] (XID = 2027143), (SESSIONID = 0), (DATABASE = wombat), (DRDAID = null), Cleanup action starting 2005-06-01 11:50:53.079 GMT Thread[main,5,main] (XID = 2027143), (SESSIONID = 0), (DATABASE = wombat), (DRDAID = null), Failed Statement is: insert into TEST1 values(?, ?, ?, ?, ?, ?)
ERROR XSDB1: Unknown page format at page Page(14,Container(0, 1057))
at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:315) at org.apache.derby.impl.store.raw.data.CachedPage.changeInstanceTo(CachedPage.java:315) at org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(CachedPage.java:195) at org.apache.derby.impl.services.cache.CachedItem.takeOnIdentity(CachedItem.java:235) at org.apache.derby.impl.services.cache.Clock.addEntry(Clock.java:798)


2. lockObject(LockSet.java:414) throws LOCK_TIMEOUT:
----------------------------------------------------------
Database Class Loader started - derby.database.classpath=''
2005-05-31 07:16:50.336 GMT Thread[main,5,main] (XID = 156), (SESSIONID = 0), (DATABASE = wombat), (DRDAID = null), Cleanup action starting 2005-05-31 07:16:50.336 GMT Thread[main,5,main] (XID = 156), (SESSIONID = 0), (DATABASE = wombat), (DRDAID = null), Failed Statement is: call SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE(?, ?, ?, ?, ?)
ERROR 40XL1: A lock could not be obtained within the time requested
at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:301) at org.apache.derby.impl.services.locks.LockSet.lockObject(LockSet.java:414) at org.apache.derby.impl.services.locks.SinglePool.lockAnObject(SinglePool.java:174) at org.apache.derby.impl.services.locks.SinglePool.lockObject(SinglePool.java:226) at org.apache.derby.impl.store.raw.xact.ContainerLocking3.lockContainer(ContainerLocking3.java:63) at org.apache.derby.impl.store.raw.data.BaseContainerHandle.useContainer(BaseContainerHandle.java:761) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(BaseDataFileFactory.java:548) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(BaseDataFileFactory.java:494) at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Xact.java:1270) at org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(OpenConglomerate.java:865)


3. Never terminates and have to kill the test! tail OnlineCompressTest.tmp shows:
----------------------------------------------
Executing test: end checkPurgePhase50 row test.
Executing test: begin deleteAllRows,4000 row test, create = true.
Executing test: no delete case complete.
Executing test: delete all rows case succeeded.
Executing test: end deleteAllRows,4000 row test.
Executing test: begin deleteAllRows,4000 row test, create = false.
Executing test: no delete case complete.
Executing test: delete all rows case succeeded.
Executing test: end deleteAllRows,4000 row test.
Executing test: begin checkPurgePhase4000 row test, create = false.


I have been digging around in the code but must admit I haven't got any closer to an explanation.

What is the best way to debug further?






Reply via email to