[ 
https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349121#comment-16349121
 ] 

Anu Engineer edited comment on HADOOP-15204 at 2/1/18 7:16 PM:
---------------------------------------------------------------

[~ste...@apache.org] I did look at getLongBytes. The issue is that it does not 
provide the same code improvement as {{getTimeDuration}} and 
{{setTimeDuration}}.

Let me support my assertion with some examples that you can look at.

Let us suppose that I have a case to read the standard configured size of 
containers. This is a configurable parameter in Ozone.

Here is how the code would look like with get/setStorageSize.
{code:java}
setStorageSize(OZONE_CONTAINER_SIZE, 1, GIGABYTES);
// let us suppose we want to read back this config value as MBs.
long valueInMB = getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);{code}
now let us see how we write this code using getLongBytes..
{code:java}
// there is no symetric function called setLongBytes.. So I have to fall back 
to setLong
// 
// Here is my first attempt.
setLong(OZONE_CONTAINER_SIZE, 1048576); 
// This looks bad, since I am not sure if the OZONE_CONTAINER_SIZE is in MB/GB 
or what, 
// So the many keys get tagged as OZONE_CONTAINER_SIZE_GB

// Now second attempt.
setLong(OZONE_CONTAINER_SIZE_GB, 1048576); 
// But this is bad too, now I can not set container size in MB, since setLong 
takes a
// whole number and not a fraction.
// So now third attempt -- convert all fields to bytes
setLong(OZONE_CONTAINER_SIZE_BYTES, 5368709120); // The default is 5 GG.
{code}
Before you think this is a made up example, this is part of changes that we 
tried which triggered this change.

Now let us go to back get examples –
{code:java}
// getLongBytes forces us to write code in a certain way which does not match 
with
// rest of code like getTimeDuration. For example, if I have a case where I 
want to read // the read the value in MB, and the ozone-default.xml is 
configured to GB.
// the case that this line solves 
// getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);

long defaultValueInBytes = getDefaultValue(OZONE_CONTAINER_SIZE_DEFAULT); // in 
bytes.
long valueInMB = BYTES.fromBytes(getLongBytes(OZONE_CONTAINER_SIZE, 
defaultValueInBytes)).toMBs();{code}
 

Now imagine repeating this code many times all over the code base, plus, the 
BYTES.fromBytes(xxx).ToMBs() is a fuction from this patch. We need some 
equivalent code.

 

In other words, I submit that these following factors make getLongBytes a less 
desirable function compared to getTimeDuration/getStorageSize.
 * lack of symetric function in getLongBytes - Without a set function it 
degenerates to a messy set of multiplication and division each time we have to 
use a storage Unit. With this those issue are cleanly isolated to a single 
place.
 * Lack of a formal storage unit - lack of a formal value like to 
TimeUnit/StorageUnit makes the code less readable (see the example 
setStorageSize) where the context also tells you the unit that we are operating 
with.
 *  Does not suit our usage pattern - Ozone code follows the patterns in HDFS. 
Hence the default value mapping is not handled well in getLongBytes.

 * Units and conversion is needed - In ozone, there are several places where we 
convert these numbers. Users can specify Quota as a easy to read Storage Units, 
like 5 GB or 10 TB. We have a dedicated code for handling that, we use the 
storage numbers to specify how large the off-heap size should be, or as I am 
showing in this example, container sizes etc.
 * The getLongBytes by itself does not address the lack of StorageUnits, what 
this patch does is that it introduces a class that is very similar to 
{{TimeUnit}}. This makes the code more readable and easy to maintain.

 

 


was (Author: anu):
[~ste...@apache.org] I did look at getLongBytes. The issue is that it does not 
provide the same code improvement as {{getTimeDuration}} and 
{{setTimeDuration}}. 

Let me support my assertion with some examples that you can look at.

Let us suppose that I have a case to read the standard configured size of 
containers. This is a configurable parameter in Ozone.

Here is how the code would look like with get/setStorageSize.
{code:java}
setStorageSize(OZONE_CONTAINER_SIZE, 1, GIGABYTES);
// let us suppose we want to read back this config value as MBs.
long valueInMB = getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);{code}
now let us see how we write this code using getLongBytes..
{code:java}
// there is no symetric function called setLongBytes.. So I have to fall back 
to setLong
// 
// Here is my first attempt.
setLong(OZONE_CONTAINER_SIZE, 1048576); 
// This looks bad, since I am not sure if the OZONE_CONTAINER_SIZE is in MB/GB 
or what, 
// So the many keys get tagged as OZONE_CONTAINER_SIZE_GB

// Now second attempt.
setLong(OZONE_CONTAINER_SIZE_GB, 1048576); 
// But this is bad too, now I can not set container size in MB, since setLong 
takes a
// whole number and not a fraction.
// So now third attempt -- convert all fields to bytes
setLong(OZONE_CONTAINER_SIZE_BYTES, 5368709120); // The default is 5 GG.
{code}
Before you think this is a made up example, this is part of changes that we 
tried which triggered this change.

Now let us go to back get examples --
{code:java}

// getLongBytes forces us to write code in a certain way which does not match 
with
// rest of code like getTimeDuration. For example, if I have a case where I 
want to read // the read the value in MB, and the ozone-default.xml is 
configured to GB.
// the case that this line solves 
// getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);

long defaultValueInBytes = getDefaultValue(OZONE_CONTAINER_SIZE_DEFAULT); // in 
bytes.
long valueInMB = BYTES.fromBytes(getLongBytes(OZONE_CONTAINER_SIZE, 
defaultValueInBytes)).toMBs();{code}
 

Now imagine repeating this code many times all over the code base, plus, the 
BYTES.fromBytes(xxx).ToMBs() is a fuction from this patch. We need some 
equivalent code.

 

In other words, I submit that these following factors make getLongBytes a less 
desirable function compared to getTimeDuration/getStorageSize.
 * lack of symetric function in getLongBytes - Without a set function it 
degenerates to a messy set of multiplication and division each time we have to 
use a storage Unit. With this those issue are cleanly isolated to a single 
place.
 * Lack of a format storage unit - lack of a formal value like to 
TimeUnit/StorageUnit makes the code less readable (see the example 
setStorageSize) where the context also tells you the unit that we are operating 
with. 
 *  Does not suit our usage pattern - Ozone code follows the patterns in HDFS. 
Hence the default value mapping is not handled well in getLongBytes. 

 * Units and Conversation is needed - In ozone, there are several places where 
we convert these numbers. Users can specify Quota as a easy to read Storage 
Units, like 5 GB or 10 TB. We have a dedicated code for handling that, we use 
the storage numbers to specify how large the off-heap size should be, or as I 
am showing in this example, container sizes etc.
 * The getLongBytes by itself does not address the lack of StorageUnits, what 
this patch does is that it introduces a class that is very similar to 
{{TimeUnit}}. This makes the code more readable and easy to maintain.

 

 

> Add Configuration API for parsing storage sizes
> -----------------------------------------------
>
>                 Key: HADOOP-15204
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15204
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 3.1.0
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>            Priority: Minor
>             Fix For: 3.1.0
>
>         Attachments: HADOOP-15204.001.patch
>
>
> Hadoop has a lot of configurations that specify memory and disk size. This 
> JIRA proposes to add an API like {{Configuration.getStorageSize}} which will 
> allow users
> to specify units like KB, MB, GB etc. This is JIRA is inspired by HDFS-8608 
> and Ozone. Adding {{getTimeDuration}} support was a great improvement for 
> ozone code base, this JIRA hopes to do the same thing for configs that deal 
> with disk and memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to