[
https://issues.apache.org/jira/browse/ACCUMULO-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christopher Tubbs resolved ACCUMULO-416.
----------------------------------------
Resolution: Abandoned
Closing this stale issue. If this is still a problem, please create a new issue
or PR at https://github.com/apache/accumulo
> reevaluate limiting the number of open files given HDFS improvements
> --------------------------------------------------------------------
>
> Key: ACCUMULO-416
> URL: https://issues.apache.org/jira/browse/ACCUMULO-416
> Project: Accumulo
> Issue Type: Improvement
> Components: tserver
> Reporter: Adam Fuchs
> Assignee: Keith Turner
> Priority: Major
>
> Tablet servers limit the number of files that can be opened for scans and for
> major compactions. The two main reasons for this limit was to reduce our
> impact on HDFS, primarily regarding connections to data nodes, and to limit
> our memory usage related to preloading file indexes. A third reason might be
> that disk thrashing could become a problem if we try to read from too many
> places at once.
> Two improvements may have made (or may soon make) this limit obsolete: HDFS
> now pools connections, and RFile now uses a multi-level index. With these
> improvements, is it reasonable to lift some of our open file restrictions?
> The tradeoff on query side might be availability vs. overall resource usage.
> On the compaction side, the tradeoff is probably write replication vs.
> thrashing on reads. I think we can make an argument that queries should be
> available at almost any cost, but the compaction tradeoff is not as clear. We
> should test the efficiency of compacting a large number of files to get a
> better feeling for how the two extremes effect read and write performance
> across the system.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)