Hi guys,

since I started to work on index removals last week, I started to get strange behaviors I put on some wrong modification I have done. Today, as I was removing the last call to the OneLevelIndex to replace it by rdnIndex, the core-integ tests are blocking.

I did a kill -3 to see where I get a blockage, and here is what I got :

"main" prio=5 tid=7fd9db800800 nid=0x10d310000 waiting on condition [10d30d000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at jdbm.helper.LRUCache.put(LRUCache.java:330)
at jdbm.recman.SnapshotRecordManager.update(SnapshotRecordManager.java:401)
        at jdbm.btree.BPage.remove(BPage.java:605)
        at jdbm.btree.BPage.remove(BPage.java:611)
        at jdbm.btree.BTree.remove(BTree.java:464)
at org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable.remove(JdbmTable.java:741) - locked <7c226be90> (a org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable) at org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:157) at org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:49) at org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.delete(AbstractBTreePartition.java:891)
...

The associated code in LRUCache is :

public void put( K key, V value, long newVersion, Serializer serializer,
        boolean neverReplace ) throws IOException, CacheEvictionException
    {
    ...
        while ( true )
        {
        ...
                else
                {
                    entry = this.findNewEntry( key, latchIndex );
                    ...
                }
            }
            catch ( CacheEvictionException e )
            {
                e.printStackTrace(); // Added for debug purposes
sleepForFreeEntry = totalSleepTime < this.MAX_WRITE_SLEEP_TIME;

                ...
            }
            ...

            if ( sleepForFreeEntry )
            {
                try
                {
                    Thread.sleep( sleepInterval );
                ....
                totalSleepTime += sleepInterval;
            }
            else
            {
                break;
            }
        }

Basically, we try to add a new element in the cache, it's full, we then try to evict one entry, it fails, we get a CacheEvictionException, and we go to sleep for 600 seconds...

It's systematic, and I guess that the fact we now pond the RdnIndex table way more often than before (just because we don't call anymore the OneLevelIndex) cause the cache to get filled and not released fast enough.

As we don't set any size for the cache, its default size is 1024. For some of the tests, this mightnot be enough, as we load a lot of entries (typically the schema elements) plus many others that get added and removed while running tests in revert mode.

If I increase the default size to 65536, the tests are passing.

Ok, now, I have to admit I haven't - yet - looked at the LRUCache code, and my analysis is just based on what I saw by quickly looking at the code, the stack traces I have added and some few blind guesses. However, I think we have a serious issue here. As far as I can tel, the code itself is probably not responsible for this behaviour, but the way we use it is.

Did I missed something ? Is there anything we can do - except increase the cache size - to get the tests passing fine ?

I'm more concern about what could occur in real life, when some users will load the server up to a point it just stop responding...

Anyone ?

Thanks !

--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Reply via email to