[
https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liang Xie updated HDFS-5461:
----------------------------
Attachment: HDFS-5461.txt
bq. It's because each open stream holds a buffer, and we have hundreds of open
streams?
i am not 100% sure, but in my mind, i agree with you, this oom is easy to
repro while we have lots of opened storefiles to be read(e.g. compaction can't
catch up sometimes)
Oh, i see, seems the fallback only meaningful for some config like mine: big
Xmx and small MaxDirectMemorySize :)
I attached a patch with more logging about using/pooled direct buffer size. In
my option, it could be useful probably while online resetting the log level to
"trace" during OOM occur. And add a simple try/catch fallback handle for OOM
without introducing any config value, per me, seems this way is more
reasonable:)
> fallback to non-ssr(local short circuit reads) while oom detected
> -----------------------------------------------------------------
>
> Key: HDFS-5461
> URL: https://issues.apache.org/jira/browse/HDFS-5461
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Liang Xie
> Attachments: HDFS-5461.txt
>
>
> Currently, the DirectBufferPool used by ssr feature seems doesn't have a
> upper-bound limit except DirectMemory VM option. So there's a risk to
> encounter direct memory oom. see HBASE-8143 for example.
> IMHO, maybe we could improve it a bit:
> 1) detect OOM or reach a setting up-limit from caller, then fallback to
> non-ssr
> 2) add a new metric about current raw consumed direct memory size.
--
This message was sent by Atlassian JIRA
(v6.1#6144)