[
https://issues.apache.org/jira/browse/HBASE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165639#comment-13165639
]
stack commented on HBASE-4965:
------------------------------
On hadoop-qa being set to 1024 fds only, thats weird. We dump the ulimit
before the test starts and it shows:
{code}
Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20
17:42:25 UTC 2011 x86_64 GNU/Linux
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 60000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 2048
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
60000
Running in Jenkins mode
{code}
... so 60k.
So, I wonder where disconnect between your finding and ulimit is? We're
running as a different user after ulimit is output?
I love that leaks report. Thats excellent.
Trying the patch locally....
> Monitor the open file descriptors and the threads counters during the unit
> tests
> --------------------------------------------------------------------------------
>
> Key: HBASE-4965
> URL: https://issues.apache.org/jira/browse/HBASE-4965
> Project: HBase
> Issue Type: Improvement
> Components: test
> Affects Versions: 0.94.0
> Environment: all
> Reporter: nkeywal
> Assignee: nkeywal
> Priority: Minor
> Attachments: 4965_all.patch, ResourceChecker.java,
> ResourceCheckerJUnitRule.java
>
>
> We're seeing a lot of issues with hadoop-qa related to threads or file
> descriptors.
> Monitoring these counters would ease the analysis.
> Note as well that
> - if we want to execute the tests in the same jvm (because the test is small
> or because we want to share the cluster) we can't afford to leak too many
> resources
> - if the tests leak, it's more difficult to detect a leak in the software
> itself.
> I attach piece of code that I used. It requires two lines in a unit test
> class to:
> - before every test, count the threads and the open file descriptor
> - after every test, compare with the previous value.
> I ran it on some tests; we have for example:
> - client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut: 232
> threads (was 231), 390 file descriptors (was 390). => TestMultiParallel uses
> 232 threads!
> - client.TestMultipleTimestamps#testWithColumnDeletes: 152 threads (was 151),
> 283 file descriptors (was 282).
> - client.TestAdmin#testCheckHBaseAvailableClosesConnection: 477 threads (was
> 294), 815 file descriptors (was 461)
> - client.TestMetaMigrationRemovingHTD#testMetaMigration: 149 threads (was
> 148), 310 file descriptors (was 307).
> It's not always leaks, we can expect some pooling effects. But still...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira