[
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157145#comment-14157145
]
Colin Patrick McCabe commented on HDFS-6988:
--------------------------------------------
I'm trying to understand the process for configuring this. First, there is the
decision as to how big to make the ramdisk. This is something that a sysadmin
needs to do ahead of time (or management software needs to do). This is
clearly going to be done in terms of a number of bytes. Then, there is setting
{{dfs.datanode.ram.disk.low.watermark.percent}}. This will determine how much
of the ramdisk we will try to keep free. Then there is
{{dfs.datanode.ram.disk.low.watermark.replicas}}. I'm not sure when you would
set this one.
I don't like the fact that {{dfs.datanode.ram.disk.low.watermark.percent}} is
an int. In a year or two, we may find that 100 GB ramdisks are common. Then
the sysadmin gets a choice between specifying 0% (0 bytes free) and 1% (try to
keep 1 GB free). Making this a float would be better, I think...
Why is {{dfs.datanode.ram.disk.low.watermark.replicas}} specified in terms of
number of replicas? Block size is a per-replica property-- I could easily have
a client that writes 256 MB or 1 GB replicas, while the DataNode is configured
with {{dfs.blocksize}} at 64MB. It's pretty common for formats like ORCFile
and Apache Parquet to use large blocks and seek around within them. This
property seems like it should be given in terms of bytes to avoid confusion.
It seems like we are translating it into a number of bytes before using it
anyway, so why not give the user access to that number directly?
bq. I explained this earlier, a single number fails to work well for a range of
disks and makes configuration mandatory. What would you choose as the default
value of this single setting. Let's say we choose 1GB or higher. Then we are
wasting at least 25% of space on a 4GB RAM disk. Or we choose 512MB. Then we
are not evicting fast enough to keep up with multiple writers on a 50GB disk.
There seems to be a hidden assumption that the number of writers (or the speed
at which they're writing) will increase with the size of the ramdisk. I don't
see why that's true. In theory, I could have a system with a small ramdisk and
a high write rate, or a system with a huge ramdisk and a low write rate. It
seems that the amount of space I want to keep free is related to a percentage
of the write rate, not to a percentage of the total ramdisk size?
> Add configurable limit for percentage-based eviction threshold
> --------------------------------------------------------------
>
> Key: HDFS-6988
> URL: https://issues.apache.org/jira/browse/HDFS-6988
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode
> Affects Versions: HDFS-6581
> Reporter: Arpit Agarwal
> Fix For: HDFS-6581
>
> Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch
>
>
> Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction
> thresholds configurable. The hard-coded thresholds may not be appropriate for
> very large RAM disks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)