[
https://issues.apache.org/jira/browse/ASTERIXDB-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404652#comment-17404652
]
Ingo Müller commented on ASTERIXDB-2948:
----------------------------------------
You can find the query
[here|https://github.com/RumbleDB/iris-hep-benchmark-sqlpp/blob/master/queries/query-8/query.sqlpp]
and a tiny sample of the data
[here|https://github.com/RumbleDB/iris-hep-benchmark-sqlpp/tree/master/data].
(The JSON file and the Parquet file should represent the same data; both have
1k outer-level records. I ran into the problem with 53M records in a Parquet
file.)
This is the output of {{ulimit -a}}:
{{core file size (blocks, -c) 0}}
{{data seg size (kbytes, -d) unlimited}}
{{scheduling priority (-e) 0}}
{{file size (blocks, -f) unlimited}}
{{pending signals (-i) 62162}}
{{max locked memory (kbytes, -l) 64}}
{{max memory size (kbytes, -m) unlimited}}
{{open files (-n) 1024}}
{{pipe size (512 bytes, -p) 8}}
{{POSIX message queues (bytes, -q) 819200}}
{{real-time priority (-r) 0}}
{{stack size (kbytes, -s) 8192}}
{{cpu time (seconds, -t) unlimited}}
{{max user processes (-u) 4096}}
{{virtual memory (kbytes, -v) unlimited}}
{{file locks (-x) unlimited}}
I guess a limit of 1024 open files is too strict and AsterixDB is not supposed
to work with this. I am surprised to see that being the default value in Amazon
Linux, though.
I'll try to run my experiments with an increased ulimit and report again.
> "Too many open files" on large data sets in Parquet/S3
> ------------------------------------------------------
>
> Key: ASTERIXDB-2948
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2948
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: EXT - External data
> Affects Versions: 0.9.8
> Reporter: Ingo Müller
> Priority: Major
>
> When I run complex queries on a very large machine (96 vCPUs, 48 configured
> IO devices/partitions) with Parquet files on S3, I occasionally get the
> following error:
> {{java.io.FileNotFoundException:
> /data/asterixdb/iodevice40/./ExternalSortGroupByRunGenerator13134601214093461962.waf
> (Too many open files)}}
> This only happens after a certain size; I think the smallest instance of the
> data set where I observed the error was around 0.5TB. I have not been able to
> test these queries with files on HDFS or the local filesystem since they do
> not fit onto the disk of the system.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)