Hoss Man created SOLR-10234:
-------------------------------
Summary: "Too many open files" in distrib tests due to fixed
HandleLimitFS (regardless of num nodes in test)
Key: SOLR-10234
URL: https://issues.apache.org/jira/browse/SOLR-10234
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Hoss Man
I just got an failure from BasicDistributedZkTest on master
(acb185b2dc7522e6a4fa55d54e82910736668f8d) that caught my attention -- the
reported failure was "Remote error message: Exception writing document id 57 to
the index; possible analysis error.", but digging intothe logs the root cause
was "Too many open files" coming from the mock
{{HandleLimitFS}} class we have...
{noformat}
[junit4] 2> 495598 ERROR (qtp155652658-4405) [ ]
o.a.s.h.RequestHandlerBase java.nio.file.FileSystemException:
/home/jenkins/lucene-solr/solr/build/solr-core/test/J1/temp/solr.cloud.BasicDistributedZkTest_8D04773C07230D3B-001/index-NIOFSDirectory-002/_o_Memory_0.mdvm:
Too many open files
[junit4] 2> at
org.apache.lucene.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:48)
[junit4] 2> at
org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
[junit4] 2> at
org.apache.lucene.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:160)
[junit4] 2> at
java.base/java.nio.file.Files.newOutputStream(Files.java:218)
[junit4] 2> at
org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:413)
[junit4] 2> at
org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:409)
[junit4] 2> at
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
[junit4] 2> at
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:665)
...
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=BasicDistributedZkTest -Dtests.method=test
-Dtests.seed=8D04773C07230D3B -Dtests.slow=true -Dtests.locale=en-ER
-Dtests.timezone=Europe/Volgograd -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
[junit4] ERROR 259s J1 | BasicDistributedZkTest.test <<<
{noformat}
...what concerns me in particular about this is is that it's coming from a
distributed test, involving many multiple "nodes" (all using the same
randomized similarity) writting to the same "file://" filesystem in the same
JVM -- but {{TestRuleTemporaryFilesCleanup}} seems to be initializing the
filesystem with a fixed {{MAX_OPEN_FILES = 2048}}
So perhaps all (distributed/cloud) Solr tests should use
{{SuppressFileSystems}} to ensure we don't get false failures like this?
Or perhaps we should enhance the way we use {{HandleLimitFS}} in our test
scaffolding so that we can give each solr node it's own mock filesystem? (with
it's own MAX_OPEN_FILES limit?)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]