[ https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349121#comment-16349121 ]
Anu Engineer edited comment on HADOOP-15204 at 2/1/18 7:16 PM: --------------------------------------------------------------- [~ste...@apache.org] I did look at getLongBytes. The issue is that it does not provide the same code improvement as {{getTimeDuration}} and {{setTimeDuration}}. Let me support my assertion with some examples that you can look at. Let us suppose that I have a case to read the standard configured size of containers. This is a configurable parameter in Ozone. Here is how the code would look like with get/setStorageSize. {code:java} setStorageSize(OZONE_CONTAINER_SIZE, 1, GIGABYTES); // let us suppose we want to read back this config value as MBs. long valueInMB = getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);{code} now let us see how we write this code using getLongBytes.. {code:java} // there is no symetric function called setLongBytes.. So I have to fall back to setLong // // Here is my first attempt. setLong(OZONE_CONTAINER_SIZE, 1048576); // This looks bad, since I am not sure if the OZONE_CONTAINER_SIZE is in MB/GB or what, // So the many keys get tagged as OZONE_CONTAINER_SIZE_GB // Now second attempt. setLong(OZONE_CONTAINER_SIZE_GB, 1048576); // But this is bad too, now I can not set container size in MB, since setLong takes a // whole number and not a fraction. // So now third attempt -- convert all fields to bytes setLong(OZONE_CONTAINER_SIZE_BYTES, 5368709120); // The default is 5 GG. {code} Before you think this is a made up example, this is part of changes that we tried which triggered this change. Now let us go to back get examples – {code:java} // getLongBytes forces us to write code in a certain way which does not match with // rest of code like getTimeDuration. For example, if I have a case where I want to read // the read the value in MB, and the ozone-default.xml is configured to GB. // the case that this line solves // getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES); long defaultValueInBytes = getDefaultValue(OZONE_CONTAINER_SIZE_DEFAULT); // in bytes. long valueInMB = BYTES.fromBytes(getLongBytes(OZONE_CONTAINER_SIZE, defaultValueInBytes)).toMBs();{code} Now imagine repeating this code many times all over the code base, plus, the BYTES.fromBytes(xxx).ToMBs() is a fuction from this patch. We need some equivalent code. In other words, I submit that these following factors make getLongBytes a less desirable function compared to getTimeDuration/getStorageSize. * lack of symetric function in getLongBytes - Without a set function it degenerates to a messy set of multiplication and division each time we have to use a storage Unit. With this those issue are cleanly isolated to a single place. * Lack of a formal storage unit - lack of a formal value like to TimeUnit/StorageUnit makes the code less readable (see the example setStorageSize) where the context also tells you the unit that we are operating with. * Does not suit our usage pattern - Ozone code follows the patterns in HDFS. Hence the default value mapping is not handled well in getLongBytes. * Units and conversion is needed - In ozone, there are several places where we convert these numbers. Users can specify Quota as a easy to read Storage Units, like 5 GB or 10 TB. We have a dedicated code for handling that, we use the storage numbers to specify how large the off-heap size should be, or as I am showing in this example, container sizes etc. * The getLongBytes by itself does not address the lack of StorageUnits, what this patch does is that it introduces a class that is very similar to {{TimeUnit}}. This makes the code more readable and easy to maintain. was (Author: anu): [~ste...@apache.org] I did look at getLongBytes. The issue is that it does not provide the same code improvement as {{getTimeDuration}} and {{setTimeDuration}}. Let me support my assertion with some examples that you can look at. Let us suppose that I have a case to read the standard configured size of containers. This is a configurable parameter in Ozone. Here is how the code would look like with get/setStorageSize. {code:java} setStorageSize(OZONE_CONTAINER_SIZE, 1, GIGABYTES); // let us suppose we want to read back this config value as MBs. long valueInMB = getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);{code} now let us see how we write this code using getLongBytes.. {code:java} // there is no symetric function called setLongBytes.. So I have to fall back to setLong // // Here is my first attempt. setLong(OZONE_CONTAINER_SIZE, 1048576); // This looks bad, since I am not sure if the OZONE_CONTAINER_SIZE is in MB/GB or what, // So the many keys get tagged as OZONE_CONTAINER_SIZE_GB // Now second attempt. setLong(OZONE_CONTAINER_SIZE_GB, 1048576); // But this is bad too, now I can not set container size in MB, since setLong takes a // whole number and not a fraction. // So now third attempt -- convert all fields to bytes setLong(OZONE_CONTAINER_SIZE_BYTES, 5368709120); // The default is 5 GG. {code} Before you think this is a made up example, this is part of changes that we tried which triggered this change. Now let us go to back get examples -- {code:java} // getLongBytes forces us to write code in a certain way which does not match with // rest of code like getTimeDuration. For example, if I have a case where I want to read // the read the value in MB, and the ozone-default.xml is configured to GB. // the case that this line solves // getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES); long defaultValueInBytes = getDefaultValue(OZONE_CONTAINER_SIZE_DEFAULT); // in bytes. long valueInMB = BYTES.fromBytes(getLongBytes(OZONE_CONTAINER_SIZE, defaultValueInBytes)).toMBs();{code} Now imagine repeating this code many times all over the code base, plus, the BYTES.fromBytes(xxx).ToMBs() is a fuction from this patch. We need some equivalent code. In other words, I submit that these following factors make getLongBytes a less desirable function compared to getTimeDuration/getStorageSize. * lack of symetric function in getLongBytes - Without a set function it degenerates to a messy set of multiplication and division each time we have to use a storage Unit. With this those issue are cleanly isolated to a single place. * Lack of a format storage unit - lack of a formal value like to TimeUnit/StorageUnit makes the code less readable (see the example setStorageSize) where the context also tells you the unit that we are operating with. * Does not suit our usage pattern - Ozone code follows the patterns in HDFS. Hence the default value mapping is not handled well in getLongBytes. * Units and Conversation is needed - In ozone, there are several places where we convert these numbers. Users can specify Quota as a easy to read Storage Units, like 5 GB or 10 TB. We have a dedicated code for handling that, we use the storage numbers to specify how large the off-heap size should be, or as I am showing in this example, container sizes etc. * The getLongBytes by itself does not address the lack of StorageUnits, what this patch does is that it introduces a class that is very similar to {{TimeUnit}}. This makes the code more readable and easy to maintain. > Add Configuration API for parsing storage sizes > ----------------------------------------------- > > Key: HADOOP-15204 > URL: https://issues.apache.org/jira/browse/HADOOP-15204 > Project: Hadoop Common > Issue Type: Improvement > Components: conf > Affects Versions: 3.1.0 > Reporter: Anu Engineer > Assignee: Anu Engineer > Priority: Minor > Fix For: 3.1.0 > > Attachments: HADOOP-15204.001.patch > > > Hadoop has a lot of configurations that specify memory and disk size. This > JIRA proposes to add an API like {{Configuration.getStorageSize}} which will > allow users > to specify units like KB, MB, GB etc. This is JIRA is inspired by HDFS-8608 > and Ozone. Adding {{getTimeDuration}} support was a great improvement for > ozone code base, this JIRA hopes to do the same thing for configs that deal > with disk and memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org