[
https://issues.apache.org/jira/browse/ASTERIXDB-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968204#comment-14968204
]
Ian Maxon commented on ASTERIXDB-1146:
--------------------------------------
This feels like a silly question, but, did you try playing with
"storage.buffercache.maxopenfiles" in the asterix-configuration.xml? We have it
set to a really absurdly high value by default, and the buffer cache is lazy
and not eager about closing file handles. If it's not set to something that
will never cause the NC to break whatever the real limit is in ulimit, this can
easily happen.
> Cleaning up left-overs once the query is done - External datasets
> ------------------------------------------------------------------
>
> Key: ASTERIXDB-1146
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1146
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: AsterixDB
> Reporter: Pouria
> Assignee: Abdullah Alamoudi
> Priority: Minor
>
> Running queries which use external datasets for a long time, without
> restarting the asterixdb instance in between, causes the number of open files
> grow, and it can eventually break the system.
> Using 'lsof' command , it seems that there are left-over 'ESTABLISHED' TCP
> connections between NC and Datanode:
> java 7576 pouria 292u IPv4 393409691 0t0
> TCP asterix-10.ics.uci.edu:46965->asterix-10.ics.uci.edu:50010 (CLOSE_WAIT)
> java 7576 pouria 293u IPv4 393412907 0t0
> TCP asterix-10.ics.uci.edu:47005->asterix-10.ics.uci.edu:50010 (ESTABLISHED)
> …
> java 32205 pouria 576u IPv4 393415126 0t0
> TCP asterix-10.ics.uci.edu:50010->asterix-10.ics.uci.edu:47056 (ESTABLISHED)
> java 32205 pouria 586u IPv4 393414645 0t0
> TCP asterix-10.ics.uci.edu:50010->asterix-10.ics.uci.edu:47045 (ESTABLISHED)
> …
> Here is the error upon system breakage from CC logs:
> org.apache.hyracks.api.exceptions.HyracksDataException:
> org.apache.hyracks.api.exceptions.HyracksDataException:
> org.apache.hyracks.api.exceptions.HyracksDataException: java.io.IOException:
> Too many open files
> at
> org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:45)
> at org.apache.hyracks.control.nc.Task.run(Task.java:312)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException:
> org.apache.hyracks.api.exceptions.HyracksDataException: java.io.IOException:
> Too many open files
> at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:358)
> at org.apache.hyracks.control.nc.Task.run(Task.java:290)
> ... 3 more
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException:
> java.io.IOException: Too many open files
> at
> org.apache.hyracks.control.nc.io.IOManager.createWorkspaceFile(IOManager.java:171)
> at
> org.apache.hyracks.control.nc.io.WorkspaceFileFactory.createManagedWorkspaceFile(WorkspaceFileFactory.java:39)
> at
> org.apache.hyracks.control.nc.Joblet.createManagedWorkspaceFile(Joblet.java:262)
> at
> org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.buildWrite(OptimizedHybridHashJoin.java:332)
> at
> org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.spillPartition(OptimizedHybridHashJoin.java:311)
> at
> org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.processTuple(OptimizedHybridHashJoin.java:237)
> at
> org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.build(OptimizedHybridHashJoin.java:215)
> at
> org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoinOperatorDescriptor$PartitionAndBuildActivityNode$1.nextFrame(OptimizedHybridHashJoinOperatorDescriptor.java:313)
> at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:342)
> ... 4 more
> Caused by: java.io.IOException: Too many open files
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:1006)
> at java.io.File.createTempFile(File.java:1989)
> at
> org.apache.hyracks.control.nc.io.IOManager.createWorkspaceFile(IOManager.java:169)
> ... 12 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)