[ 
https://issues.apache.org/jira/browse/LUCENE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5612:
----------------------------------

    Attachment: LUCENE-5612-more-sophisticated-crusher.patch

Here is a more sophisticated crusher using ant:
- I rewrote the socket code to use a timeout in the server: if no client 
connects anymore after 2 seconds waiting in accept() the server shuts silently 
down
- The clients aquires a lock for a fixed amount of times (2000) - see below!

The problem with the whole test is and why the number of connects are limited 
to 2000::
- Most operating systems have a limited amount of empheral ports.
- If both clients connect like 20000 times in short time, there are 40000 
empheral ports reserved (because they are only freed after some waiting time). 
This leads to errors like "out of buffer space" or "no more file descriptors" 
in linux.

The attached patch also includes the ANT task into our "ant test". In core, 
"ant test" is redefined to run "common.test" (inherited) and then the new 
"test-lock-factory".

On Windows test passes (as expected):
{noformat}
   [junit4]
   [junit4] JVM J0:     1.06 ..   183.75 =   182.69s
   [junit4] JVM J1:     1.06 ..   193.50 =   192.44s
   [junit4] JVM J2:     1.06 ..   183.84 =   182.78s
   [junit4] JVM J3:     1.06 ..   193.47 =   192.41s
   [junit4] Execution time total: 3 minutes 13 seconds
   [junit4] Tests summary: 391 suites, 3055 tests, 179 ignored (169 assumptions)
     [echo] 5 slowest tests:
[junit4:tophints] 206.37s | org.apache.lucene.index.TestIndexReaderClose
[junit4:tophints]  61.07s | 
org.apache.lucene.codecs.compressing.TestCompressingTermVectorsFormat
[junit4:tophints]  37.95s | org.apache.lucene.index.TestConcurrentMergeScheduler
[junit4:tophints]  36.55s | org.apache.lucene.index.TestIndexWriterMerging
[junit4:tophints]  31.86s | org.apache.lucene.search.TestSloppyPhraseQuery

-check-totals:

common.test:

test-lock-factory:
[LockVerifyServer] Ready on port 51234...
[LockStressTest1] 0.0% done.
[LockStressTest2] 0.0% done.
[LockStressTest2] 50.0% done.
[LockStressTest1] 50.0% done.
[LockStressTest2] Finished 2000 tries.
[LockStressTest1] Finished 2000 tries.
[LockVerifyServer] Stopped server after 2000 seconds silence.

test:

BUILD SUCCESSFUL
Total time: 3 minutes 37 seconds
{noformat}

In linux it fails:

{noformat}
test-lock-factory:
[LockVerifyServer] Ready on port 51234...
[LockStressTest2] 0.0% done.
[LockStressTest2] Exception in thread "main" java.lang.RuntimeException: lock 
was double acquired
[LockStressTest2]       at 
org.apache.lucene.store.VerifyingLockFactory$CheckedLock.verify(VerifyingLockFactory.java:63)
[LockStressTest2]       at 
org.apache.lucene.store.VerifyingLockFactory$CheckedLock.obtain(VerifyingLockFactory.java:74)
[LockStressTest2]       at 
org.apache.lucene.store.LockStressTest.main(LockStressTest.java:93)
[LockStressTest1] 0.0% done.
[LockStressTest1] Exception in thread "main" java.lang.RuntimeException: 
java.net.SocketException: Connection reset
[LockStressTest1]       at 
org.apache.lucene.store.VerifyingLockFactory$CheckedLock.verify(VerifyingLockFactory.java:66)
[LockStressTest1]       at 
org.apache.lucene.store.VerifyingLockFactory$CheckedLock.close(VerifyingLockFactory.java:91)
[LockStressTest1]       at 
org.apache.lucene.store.LockStressTest.main(LockStressTest.java:100)
[LockStressTest1] Caused by: java.net.SocketException: Connection reset
[LockStressTest1]       at 
java.net.SocketInputStream.read(SocketInputStream.java:196)
[LockStressTest1]       at 
java.net.SocketInputStream.read(SocketInputStream.java:122)
[LockStressTest1]       at 
java.net.SocketInputStream.read(SocketInputStream.java:210)
[LockStressTest1]       at 
org.apache.lucene.store.VerifyingLockFactory$CheckedLock.verify(VerifyingLockFactory.java:61)
[LockStressTest1]       ... 2 more
[LockVerifyServer] Exception in thread "main" java.lang.IllegalStateException: 
[0s]  ERROR: id 2 got lock, but 1 already holds the lock
[LockVerifyServer]      at 
org.apache.lucene.store.LockVerifyServer.main(LockVerifyServer.java:76)

BUILD FAILED
/home/thetaphi/Desktop/trunk-lusolr1/lucene/core/build.xml:162: 
Java returned: 1
Java returned: 1
Java returned: 1
{noformat}

> LockStressTest fails always with NativeFSLockFactory
> ----------------------------------------------------
>
>                 Key: LUCENE-5612
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5612
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Priority: Blocker
>             Fix For: 4.8
>
>         Attachments: LUCENE-5612-instant-crush.patch, 
> LUCENE-5612-instant-crush.patch, 
> LUCENE-5612-more-sophisticated-crusher.patch, LUCENE-5612-tester.patch, 
> LUCENE-5612-tester.patch, LUCENE-5612.patch
>
>
> I was looking at this, because i wanted to remove the static map inside 
> NativeFSLockFactory (no particular reason: it just smells bad, we require 
> java7, and you get overlappingexception as of java6 so its unnecessary).
> Before changing any code, i wanted to run lockstresstest first, just to 
> ensure it works: but it fails always. Simple works fine always.
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: lock was double acquired at 
> org.apache.lucene.store.VerifyingLockFactory$CheckedLock.verify(VerifyingLockFactory.java:67)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to