jihoonson commented on a change in pull request #10203:
URL: https://github.com/apache/druid/pull/10203#discussion_r461406956



##########
File path: 
core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java
##########
@@ -128,71 +135,76 @@ public static long parse(String number, long nullValue)
       return nullValue;
     }
 
-    number = number.trim().toLowerCase(Locale.getDefault());
+    number = number.trim();
     if (number.length() == 0) {
       return nullValue;
     }
     return parseInner(number);
   }
 
-  private static long parseInner(String number)
+  private static long parseInner(String rawNumber)
   {
-    int index = number.length() - 1;
-    boolean isBinary = false;
-    char unit = number.charAt(index--);
+    String number = StringUtils.toLowerCase(rawNumber);
+    if (number.charAt(0) == '-') {
+      throw new IAE("Invalid format of number: %s. Negative value is not 
allowed.", rawNumber);
+    }
+
+    int lastDigitIndex = number.length() - 1;
+    boolean isBinaryByte = false;
+    char unit = number.charAt(lastDigitIndex--);
     if (unit == 'b') {
-      if (index < 2) {
-        throw new IAE("invalid format of number[%s]", number);
+      //unit ends with 'b' must be format of KiB/MiB/GiB/TiB/PiB, so at least 
3 extra characters are required
+      if (lastDigitIndex < 2) {
+        throw new IAE("Invalid format of number: %s", rawNumber);
       }
-      if (number.charAt(index--) != 'i') {
-        throw new IAE("invalid format of number[%s]", number);
+      if (number.charAt(lastDigitIndex--) != 'i') {
+        throw new IAE("Invalid format of number: %s", rawNumber);
       }
 
-      unit = number.charAt(index--);
-      isBinary = true;
+      unit = number.charAt(lastDigitIndex--);
+      isBinaryByte = true;
     }
+
     long base = 1;
     switch (unit) {
       case 'k':
-        base = isBinary ? 1024 : 1_000;
+        base = isBinaryByte ? 1024 : 1_000;
         break;
 
       case 'm':
-        base = isBinary ? 1024 * 1024 : 1_000_000;
+        base = isBinaryByte ? 1024 * 1024 : 1_000_000;
         break;
 
       case 'g':
-        base = isBinary ? 1024 * 1024 * 1024 : 1_000_000_000;
+        base = isBinaryByte ? 1024 * 1024 * 1024 : 1_000_000_000;
         break;
 
       case 't':
-        base = isBinary ? 1024 * 1024 * 1024 * 1024L : 1_000_000_000_000L;
+        base = isBinaryByte ? 1024 * 1024 * 1024 * 1024L : 1_000_000_000_000L;
         break;
 
       case 'p':
-        base = isBinary ? 1024L * 1024 * 1024 * 1024 * 1024 : 
1_000_000_000_000_000L;
+        base = isBinaryByte ? 1024L * 1024 * 1024 * 1024 * 1024 : 
1_000_000_000_000_000L;
         break;
 
       default:
+        lastDigitIndex++;
         if (!Character.isDigit(unit)) {
-          throw new IAE("invalid character in number[%s]", number);
+          throw new IAE("Invalid format of number: %s", rawNumber);
         }
         break;
     }
 
     try {
-      if (base > 1 && index >= 0) {
-        long value = Long.parseLong(number.substring(0, index + 1)) * base;
-        if (value < base) {
-          throw new IAE("number [%s] overflow", number);
-        }
-        return value;
-      } else {
-        return Long.parseLong(number);
+      long value = Long.parseLong(number.substring(0, lastDigitIndex + 1)) * 
base;
+      if (base > 1 && value < base) {
+        //for base == 1, overflow has been checked in parseLong
+        throw new IAE("Number overflow: %s", rawNumber);
       }
+      return value;
     }
     catch (NumberFormatException e) {
-      throw new IAE("invalid format of number[%s]", number);
+      throw new IAE("Invalid format of number: %s", rawNumber);

Review comment:
       It would be nice to log that the number might be out of range of `long`.

##########
File path: docs/configuration/human-readable-byte.md
##########
@@ -0,0 +1,83 @@
+---
+id: human-readable-byte
+title: "Human-readable Byte Configuration Reference"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+This page documents all configuration properties related to bytes.
+
+These properties can be configured through 2 ways: 
+1. a simple number in byte
+2. a number with a unit suffix
+
+## A number in byte
+
+Given the cache size is 3G, there's a configuration as below
+
+````
+druid.cache.sizeInBytes=3000000000 #3 * 1000_000_000
+````
+
+
+## A number with a unit suffix
+
+Sometimes value in bytes could be very large in Druid, it's counterintuitive 
to set value of these properties as above.
+Here comes another way, a number with a unit suffix.
+
+Given the total size of disks is 1T, the configuration can be
+
+````
+druid.segmentCache.locations=[{"path":"/segment-cache","maxSize":"1g"}]

Review comment:
       There is a mismatch in this example and the above statement. It should 
be `1t`.

##########
File path: 
core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java
##########
@@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.java.util.common;
+
+import com.fasterxml.jackson.databind.annotation.JsonSerialize;
+
+import java.io.Serializable;
+
+@JsonSerialize(using = HumanReadableBytesSerializer.class)
+public class HumanReadableBytes implements Serializable

Review comment:
       I missed this before. Does `Serializable` do anything here? It doesn't 
seem so.

##########
File path: 
core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java
##########
@@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.java.util.common;
+
+import com.fasterxml.jackson.databind.annotation.JsonSerialize;
+
+import java.io.Serializable;
+
+@JsonSerialize(using = HumanReadableBytesSerializer.class)

Review comment:
       The custom serializer doesn't seem necessary. AFAIT, this class never 
gets serialized. In deserializing, Jackson will magically finds the proper 
constructor depending on the parameter type.

##########
File path: docs/configuration/human-readable-byte.md
##########
@@ -0,0 +1,83 @@
+---
+id: human-readable-byte
+title: "Human-readable Byte Configuration Reference"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+This page documents all configuration properties related to bytes.
+
+These properties can be configured through 2 ways: 
+1. a simple number in byte
+2. a number with a unit suffix
+
+## A number in byte
+
+Given the cache size is 3G, there's a configuration as below
+
+````
+druid.cache.sizeInBytes=3000000000 #3 * 1000_000_000
+````
+
+
+## A number with a unit suffix
+
+Sometimes value in bytes could be very large in Druid, it's counterintuitive 
to set value of these properties as above.
+Here comes another way, a number with a unit suffix.
+
+Given the total size of disks is 1T, the configuration can be
+
+````
+druid.segmentCache.locations=[{"path":"/segment-cache","maxSize":"1g"}]
+````
+
+### Supported Units
+In the world of computer, a unit like `K` is ambiguous. It means 1000 or 1024 
in different contexts, for more information please see 
[Here](https://en.wikipedia.org/wiki/Binary_prefix).
+
+To make it clear, the base of units are defined as below
+
+| Unit | Description | Base |
+|---|---|---|
+| K | Kilo Decimal Byte | 1000 |

Review comment:
       nit: this doesn't have to be done in this PR, but I think it would be 
nice to allow `KB`, `MB`, etc as aliases for decimal units.

##########
File path: 
core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java
##########
@@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.java.util.common;
+
+import com.fasterxml.jackson.databind.annotation.JsonSerialize;
+
+import java.io.Serializable;
+
+@JsonSerialize(using = HumanReadableBytesSerializer.class)
+public class HumanReadableBytes implements Serializable
+{
+  public static final HumanReadableBytes ZERO = new HumanReadableBytes(0L);
+
+  private long bytes;

Review comment:
       Can be final.

##########
File path: 
core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java
##########
@@ -128,71 +135,76 @@ public static long parse(String number, long nullValue)
       return nullValue;
     }
 
-    number = number.trim().toLowerCase(Locale.getDefault());
+    number = number.trim();
     if (number.length() == 0) {
       return nullValue;
     }
     return parseInner(number);
   }
 
-  private static long parseInner(String number)
+  private static long parseInner(String rawNumber)
   {
-    int index = number.length() - 1;
-    boolean isBinary = false;
-    char unit = number.charAt(index--);
+    String number = StringUtils.toLowerCase(rawNumber);
+    if (number.charAt(0) == '-') {
+      throw new IAE("Invalid format of number: %s. Negative value is not 
allowed.", rawNumber);
+    }
+
+    int lastDigitIndex = number.length() - 1;
+    boolean isBinaryByte = false;
+    char unit = number.charAt(lastDigitIndex--);
     if (unit == 'b') {
-      if (index < 2) {
-        throw new IAE("invalid format of number[%s]", number);
+      //unit ends with 'b' must be format of KiB/MiB/GiB/TiB/PiB, so at least 
3 extra characters are required
+      if (lastDigitIndex < 2) {
+        throw new IAE("Invalid format of number: %s", rawNumber);
       }
-      if (number.charAt(index--) != 'i') {
-        throw new IAE("invalid format of number[%s]", number);
+      if (number.charAt(lastDigitIndex--) != 'i') {
+        throw new IAE("Invalid format of number: %s", rawNumber);
       }
 
-      unit = number.charAt(index--);
-      isBinary = true;
+      unit = number.charAt(lastDigitIndex--);
+      isBinaryByte = true;
     }
+
     long base = 1;
     switch (unit) {
       case 'k':
-        base = isBinary ? 1024 : 1_000;
+        base = isBinaryByte ? 1024 : 1_000;
         break;
 
       case 'm':
-        base = isBinary ? 1024 * 1024 : 1_000_000;
+        base = isBinaryByte ? 1024 * 1024 : 1_000_000;
         break;
 
       case 'g':
-        base = isBinary ? 1024 * 1024 * 1024 : 1_000_000_000;
+        base = isBinaryByte ? 1024 * 1024 * 1024 : 1_000_000_000;
         break;
 
       case 't':
-        base = isBinary ? 1024 * 1024 * 1024 * 1024L : 1_000_000_000_000L;
+        base = isBinaryByte ? 1024 * 1024 * 1024 * 1024L : 1_000_000_000_000L;
         break;
 
       case 'p':
-        base = isBinary ? 1024L * 1024 * 1024 * 1024 * 1024 : 
1_000_000_000_000_000L;
+        base = isBinaryByte ? 1024L * 1024 * 1024 * 1024 * 1024 : 
1_000_000_000_000_000L;
         break;
 
       default:
+        lastDigitIndex++;
         if (!Character.isDigit(unit)) {
-          throw new IAE("invalid character in number[%s]", number);
+          throw new IAE("Invalid format of number: %s", rawNumber);
         }
         break;
     }
 
     try {
-      if (base > 1 && index >= 0) {
-        long value = Long.parseLong(number.substring(0, index + 1)) * base;
-        if (value < base) {
-          throw new IAE("number [%s] overflow", number);
-        }
-        return value;
-      } else {
-        return Long.parseLong(number);
+      long value = Long.parseLong(number.substring(0, lastDigitIndex + 1)) * 
base;
+      if (base > 1 && value < base) {
+        //for base == 1, overflow has been checked in parseLong
+        throw new IAE("Number overflow: %s", rawNumber);
       }
+      return value;
     }
     catch (NumberFormatException e) {
-      throw new IAE("invalid format of number[%s]", number);
+      throw new IAE("Invalid format of number: %s", rawNumber);

Review comment:
       I meant the error message should be understandable for Druid operators. 
The current message is confusing since it doesn't mention another possibility 
for out of range. Perhaps it's better to be `Invalid number format or out of 
range of long: %s`.

##########
File path: docs/configuration/human-readable-byte.md
##########
@@ -0,0 +1,83 @@
+---
+id: human-readable-byte
+title: "Human-readable Byte Configuration Reference"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+This page documents all configuration properties related to bytes.
+
+These properties can be configured through 2 ways: 
+1. a simple number in byte
+2. a number with a unit suffix
+
+## A number in byte
+
+Given the cache size is 3G, there's a configuration as below
+
+````
+druid.cache.sizeInBytes=3000000000 #3 * 1000_000_000
+````
+
+
+## A number with a unit suffix
+
+Sometimes value in bytes could be very large in Druid, it's counterintuitive 
to set value of these properties as above.
+Here comes another way, a number with a unit suffix.
+
+Given the total size of disks is 1T, the configuration can be
+
+````
+druid.segmentCache.locations=[{"path":"/segment-cache","maxSize":"1g"}]
+````
+
+### Supported Units
+In the world of computer, a unit like `K` is ambiguous. It means 1000 or 1024 
in different contexts, for more information please see 
[Here](https://en.wikipedia.org/wiki/Binary_prefix).
+
+To make it clear, the base of units are defined as below
+
+| Unit | Description | Base |
+|---|---|---|
+| K | Kilo Decimal Byte | 1000 |

Review comment:
       That's true, but I'm not sure how much it matters since `KB` and `KiB` 
are approximately same. Also, I bet there will be more people who try setting 
some value before reading the doc. It would be nice if it just works whatever 
they put.

##########
File path: 
core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java
##########
@@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.java.util.common;
+
+import com.fasterxml.jackson.databind.annotation.JsonSerialize;
+
+import java.io.Serializable;
+
+@JsonSerialize(using = HumanReadableBytesSerializer.class)

Review comment:
       Hmm, I'm not sure why we have `ServerConfigTest.testSerde()` even though 
`ServerConfig` doesn't seem to be serialized/deserialized. @jon-wei do you have 
any idea?

##########
File path: 
core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java
##########
@@ -128,71 +135,76 @@ public static long parse(String number, long nullValue)
       return nullValue;
     }
 
-    number = number.trim().toLowerCase(Locale.getDefault());
+    number = number.trim();
     if (number.length() == 0) {
       return nullValue;
     }
     return parseInner(number);
   }
 
-  private static long parseInner(String number)
+  private static long parseInner(String rawNumber)
   {
-    int index = number.length() - 1;
-    boolean isBinary = false;
-    char unit = number.charAt(index--);
+    String number = StringUtils.toLowerCase(rawNumber);
+    if (number.charAt(0) == '-') {
+      throw new IAE("Invalid format of number: %s. Negative value is not 
allowed.", rawNumber);
+    }
+
+    int lastDigitIndex = number.length() - 1;
+    boolean isBinaryByte = false;
+    char unit = number.charAt(lastDigitIndex--);
     if (unit == 'b') {
-      if (index < 2) {
-        throw new IAE("invalid format of number[%s]", number);
+      //unit ends with 'b' must be format of KiB/MiB/GiB/TiB/PiB, so at least 
3 extra characters are required
+      if (lastDigitIndex < 2) {
+        throw new IAE("Invalid format of number: %s", rawNumber);
       }
-      if (number.charAt(index--) != 'i') {
-        throw new IAE("invalid format of number[%s]", number);
+      if (number.charAt(lastDigitIndex--) != 'i') {
+        throw new IAE("Invalid format of number: %s", rawNumber);
       }
 
-      unit = number.charAt(index--);
-      isBinary = true;
+      unit = number.charAt(lastDigitIndex--);
+      isBinaryByte = true;
     }
+
     long base = 1;
     switch (unit) {
       case 'k':
-        base = isBinary ? 1024 : 1_000;
+        base = isBinaryByte ? 1024 : 1_000;
         break;
 
       case 'm':
-        base = isBinary ? 1024 * 1024 : 1_000_000;
+        base = isBinaryByte ? 1024 * 1024 : 1_000_000;
         break;
 
       case 'g':
-        base = isBinary ? 1024 * 1024 * 1024 : 1_000_000_000;
+        base = isBinaryByte ? 1024 * 1024 * 1024 : 1_000_000_000;
         break;
 
       case 't':
-        base = isBinary ? 1024 * 1024 * 1024 * 1024L : 1_000_000_000_000L;
+        base = isBinaryByte ? 1024 * 1024 * 1024 * 1024L : 1_000_000_000_000L;
         break;
 
       case 'p':
-        base = isBinary ? 1024L * 1024 * 1024 * 1024 * 1024 : 
1_000_000_000_000_000L;
+        base = isBinaryByte ? 1024L * 1024 * 1024 * 1024 * 1024 : 
1_000_000_000_000_000L;
         break;
 
       default:
+        lastDigitIndex++;
         if (!Character.isDigit(unit)) {
-          throw new IAE("invalid character in number[%s]", number);
+          throw new IAE("Invalid format of number: %s", rawNumber);
         }
         break;
     }
 
     try {
-      if (base > 1 && index >= 0) {
-        long value = Long.parseLong(number.substring(0, index + 1)) * base;
-        if (value < base) {
-          throw new IAE("number [%s] overflow", number);
-        }
-        return value;
-      } else {
-        return Long.parseLong(number);
+      long value = Long.parseLong(number.substring(0, lastDigitIndex + 1)) * 
base;
+      if (base > 1 && value < base) {
+        //for base == 1, overflow has been checked in parseLong
+        throw new IAE("Number overflow: %s", rawNumber);
       }
+      return value;
     }
     catch (NumberFormatException e) {
-      throw new IAE("invalid format of number[%s]", number);
+      throw new IAE("Invalid format of number: %s", rawNumber);

Review comment:
       I meant the error message should be correct so that Druid operators can 
fix the issue. The current message can be misleading since it doesn't mention 
another possibility for out of range. Perhaps it's better to be `Invalid number 
format or out of range of long: %s`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to