Hi guys,
since I started to work on index removals last week, I started to get
strange behaviors I put on some wrong modification I have done. Today,
as I was removing the last call to the OneLevelIndex to replace it by
rdnIndex, the core-integ tests are blocking.
I did a kill -3 to see where I get a blockage, and here is what I got :
"main" prio=5 tid=7fd9db800800 nid=0x10d310000 waiting on condition
[10d30d000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at jdbm.helper.LRUCache.put(LRUCache.java:330)
at
jdbm.recman.SnapshotRecordManager.update(SnapshotRecordManager.java:401)
at jdbm.btree.BPage.remove(BPage.java:605)
at jdbm.btree.BPage.remove(BPage.java:611)
at jdbm.btree.BTree.remove(BTree.java:464)
at
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable.remove(JdbmTable.java:741)
- locked <7c226be90> (a
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable)
at
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:157)
at
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:49)
at
org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.delete(AbstractBTreePartition.java:891)
...
The associated code in LRUCache is :
public void put( K key, V value, long newVersion, Serializer
serializer,
boolean neverReplace ) throws IOException, CacheEvictionException
{
...
while ( true )
{
...
else
{
entry = this.findNewEntry( key, latchIndex );
...
}
}
catch ( CacheEvictionException e )
{
e.printStackTrace(); // Added for debug purposes
sleepForFreeEntry = totalSleepTime <
this.MAX_WRITE_SLEEP_TIME;
...
}
...
if ( sleepForFreeEntry )
{
try
{
Thread.sleep( sleepInterval );
....
totalSleepTime += sleepInterval;
}
else
{
break;
}
}
Basically, we try to add a new element in the cache, it's full, we then
try to evict one entry, it fails, we get a CacheEvictionException, and
we go to sleep for 600 seconds...
It's systematic, and I guess that the fact we now pond the RdnIndex
table way more often than before (just because we don't call anymore the
OneLevelIndex) cause the cache to get filled and not released fast enough.
As we don't set any size for the cache, its default size is 1024. For
some of the tests, this mightnot be enough, as we load a lot of entries
(typically the schema elements) plus many others that get added and
removed while running tests in revert mode.
If I increase the default size to 65536, the tests are passing.
Ok, now, I have to admit I haven't - yet - looked at the LRUCache code,
and my analysis is just based on what I saw by quickly looking at the
code, the stack traces I have added and some few blind guesses.
However, I think we have a serious issue here. As far as I can tel, the
code itself is probably not responsible for this behaviour, but the way
we use it is.
Did I missed something ? Is there anything we can do - except increase
the cache size - to get the tests passing fine ?
I'm more concern about what could occur in real life, when some users
will load the server up to a point it just stop responding...
Anyone ?
Thanks !
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com