[jira] [Updated] (HBASE-26566) Optimize encodeNumeric in OrderedBytes

Yutong Xiao (Jira) Thu, 16 Dec 2021 03:43:04 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yutong Xiao updated HBASE-26566:
--------------------------------
    Description: 
Found current estimate E step in OrderedBytes is to search step by step in 
loops. We can directly calculate the length to move the point and let the time 
complexity become to O(1).
>From the comments of the method encodeNumericLarge:
{panel:title=encodeNumericLarge}
Encode the large magnitude floating point number *val* using the key encoding. 
The caller guarantees that *val* will be
finite and abs({*}val{*}) >= 1.0. 
A floating point value is encoded as an integer exponent *E* and a mantissa 
{*}M{*}. The original value is equal to {*}(M * 100^E){*}. *E* is set to the 
smallest value possible without making *M* greater than or equal to 1.0.
Each centimal digit of the mantissa is stored in a byte. If the value of the 
centimal digit is *X* (hence X>=0 and X<=99) then the byte value will be 2*X+1 
for every byte of the mantissa, except for the last byte which will be 2*X+0. 
The mantissa must be the minimum number of bytes necessary to represent the 
value; trailing *X==0* digits are omitted. This means that the mantissa will 
never contain a byte with the value 0x00.
The function could be divided into 4 steps:
 # Confirm the sign (Negative or not)
 # Normalise the abs({*}val{*}), transformed to *(M * 100^E)*
 # Encode *M* by peeling off centimal digit and 
 # Encode *X*{panel}
At the step 2, we normalise abs(*val*) and determine the exponent *E*. 
The current implementation is :
{code:java}
    while (abs.compareTo(E32) >= 0 && e <= 350) { abs = abs.movePointLeft(32); 
e +=16; }
    while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); 
e+= 4; }
    while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = 
abs.movePointLeft(2); e++; }
{code}
Which just move the point of abs(*val*), (which is in form yyyy.zzzz, where y & 
z are digits, e.g. 1000.0001) from right to left step by step, until abs(val) < 
1. 
In the loop body, the increment of e is always the half of steps the point 
moved to left. (step=32, deltaE=16), (step=8, deltaE=4), (step=2, deltaE=1)
As e starts from 0, the value of e must be the half of the total moved steps at 
last.
So that we could save the above three loops and calculate the moved length and 
e directly.
My implementation:
{code:java}
    // This is the number of digits of the Integer part of abs(val) when val > 
10
    int integerDigits = abs.precision() - abs.scale();
    // If length of the Integer part is odd, from the last loop above, we 
actually moved one more step forward.
    int lengthToMoveLeft = integerDigits % 2 == 0 ? integerDigits : 
integerDigits + 1;
    e = lengthToMoveLeft / 2;
    // The edge condition.
    if (e > 350) {
      e = 351;
      lengthToMoveLeft = 702;
    }
    abs = abs.movePointLeft(lengthToMoveLeft);
{code}
In this case, we only move the point once. The time complexity is O(1).
Same idea to the method of encodeNumericSmall. 
>From the JMH test, (*JmhBenchmark.java & Benchmark.log in attachments*), the 
>throughput has an incredible improvement.

At step 3 peeling off M into centimals. Current implementation is:

{code:java}
for (int i = 0; i < 18 && abs.compareTo(BigDecimal.ZERO) != 0; i++) {
      abs = abs.movePointRight(2); // Will create new BigDecimal object.
      d = abs.intValue();
      dst.put((byte) (2 * d + 1));
      abs = abs.subtract(BigDecimal.valueOf(d)); // Will create new BigDecimal 
object.
}
{code}
Also move the point of big decimal step by step. BigDecimal operations are 
costly and create plenty of new BigDecimal objects, which are only used once. 
Here I tried to directly encoding the BigDecimal based on the string to avoid 
BigDecimal operations. 
My implementation:

{code:java}
private static void encodeToCentimal(PositionedByteRange dst, BigDecimal val) {
    String stringOfAbs = val.stripTrailingZeros().toPlainString();
    String value = stringOfAbs.substring(stringOfAbs.indexOf('.') + 1);
    int d;

    int maxPrecision = Math.min(MAX_NUM_ENCODE_BYTES * 2, value.length());
    for (int i = 0; i < maxPrecision; i += 2) {
      d = (value.charAt(i) - '0') * 10;
      if (i + 1 < maxPrecision) {
        d += (value.charAt(i + 1) - '0');
      }
      dst.put((byte) (2 * d + 1));
    }
  }
{code}
>From the JMH test, (*EncodingBenchmark.java & Benchmark-encoding.log*) this 
>raises the throughtput of the encoding about 200%.

Have did a jmh benchmark. The result and code are in the attachments. 

  was:
Found current estimate E step in OrderedBytes is to search step by step in 
loops. We can directly calculate the length to move the point and let the time 
complexity become to O(1).
>From the comments of the method encodeNumericLarge:
{panel:title=encodeNumericLarge}
Encode the large magnitude floating point number *val* using the key encoding. 
The caller guarantees that *val* will be
finite and abs({*}val{*}) >= 1.0. 
A floating point value is encoded as an integer exponent *E* and a mantissa 
{*}M{*}. The original value is equal to {*}(M * 100^E){*}. *E* is set to the 
smallest value possible without making *M* greater than or equal to 1.0.
Each centimal digit of the mantissa is stored in a byte. If the value of the 
centimal digit is *X* (hence X>=0 and X<=99) then the byte value will be 2*X+1 
for every byte of the mantissa, except for the last byte which will be 2*X+0. 
The mantissa must be the minimum number of bytes necessary to represent the 
value; trailing *X==0* digits are omitted. This means that the mantissa will 
never contain a byte with the value 0x00.
The function could be divided into 4 steps:
 # Confirm the sign (Negative or not)
 # Normalise the abs({*}val{*}), transformed to *(M * 100^E)*
 # Encode *M* by peeling off centimal digit and 
 # Encode *X*{panel}
At the step 2, we normalise abs(*val*) and determine the exponent *E*. 
The current implementation is :
{code:java}
    while (abs.compareTo(E32) >= 0 && e <= 350) { abs = abs.movePointLeft(32); 
e +=16; }
    while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); 
e+= 4; }
    while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = 
abs.movePointLeft(2); e++; }
{code}
Which just move the point of abs(*val*), (which is in form yyyy.zzzz, where y & 
z are digits, e.g. 1000.0001) from right to left step by step, until abs(val) < 
1. 
In the loop body, the increment of e is always the half of steps the point 
moved to left. (step=32, deltaE=16), (step=8, deltaE=4), (step=2, deltaE=1)
As e starts from 0, the value of e must be the half of the total moved steps at 
last.
So that we could save the above three loops and calculate the moved length and 
e directly.
My implementation:
{code:java}
    // This is the number of digits of the Integer part of abs(val) when val > 
10
    int integerDigits = abs.precision() - abs.scale();
    // If length of the Integer part is odd, from the last loop above, we 
actually moved one more step forward.
    int lengthToMoveLeft = integerDigits % 2 == 0 ? integerDigits : 
integerDigits + 1;
    e = lengthToMoveLeft / 2;
    // The edge condition.
    if (e > 350) {
      e = 351;
      lengthToMoveLeft = 702;
    }
    abs = abs.movePointLeft(lengthToMoveLeft);
{code}
In this case, we only move the point once. The time complexity is O(1).
Same idea to the method of encodeNumericSmall. 
>From the JMH test, (JmhBenchmark.java & Benchmark.log in attachments), the 
>throughput has an incredible improvement.

At step 3 peeling off M into centimals. Current implementation is:

{code:java}
for (int i = 0; i < 18 && abs.compareTo(BigDecimal.ZERO) != 0; i++) {
      abs = abs.movePointRight(2); // Will create new BigDecimal object.
      d = abs.intValue();
      dst.put((byte) (2 * d + 1));
      abs = abs.subtract(BigDecimal.valueOf(d)); // Will create new BigDecimal 
object.
}
{code}
Also move the point of big decimal step by step. BigDecimal operations are 
costly and create plenty of new BigDecimal objects, which are only used once. 
Here I tried to directly encoding the BigDecimal based on the string to avoid 
BigDecimal operations. 
My implementation:

{code:java}
private static void encodeToCentimal(PositionedByteRange dst, BigDecimal val) {
    String stringOfAbs = val.stripTrailingZeros().toPlainString();
    String value = stringOfAbs.substring(stringOfAbs.indexOf('.') + 1);
    int d;

    int maxPrecision = Math.min(MAX_NUM_ENCODE_BYTES * 2, value.length());
    for (int i = 0; i < maxPrecision; i += 2) {
      d = (value.charAt(i) - '0') * 10;
      if (i + 1 < maxPrecision) {
        d += (value.charAt(i + 1) - '0');
      }
      dst.put((byte) (2 * d + 1));
    }
  }
{code}
>From the JMH test, (EncodingBenchmark.java & Benchmark-encoding.log) this 
>raises the throughtput of the encoding about 200%.

Have did a jmh benchmark. The result and code are in the attachments. 


> Optimize encodeNumeric in OrderedBytes
> --------------------------------------
>
>                 Key: HBASE-26566
>                 URL: https://issues.apache.org/jira/browse/HBASE-26566
>             Project: HBase
>          Issue Type: Task
>          Components: Performance
>            Reporter: Yutong Xiao
>            Assignee: Yutong Xiao
>            Priority: Major
>         Attachments: Benchmark-encoding.log, Benchmark.log, 
> EncodingBenchmark.java, JmhBenchmark.java
>
>
> Found current estimate E step in OrderedBytes is to search step by step in 
> loops. We can directly calculate the length to move the point and let the 
> time complexity become to O(1).
> From the comments of the method encodeNumericLarge:
> {panel:title=encodeNumericLarge}
> Encode the large magnitude floating point number *val* using the key 
> encoding. The caller guarantees that *val* will be
> finite and abs({*}val{*}) >= 1.0. 
> A floating point value is encoded as an integer exponent *E* and a mantissa 
> {*}M{*}. The original value is equal to {*}(M * 100^E){*}. *E* is set to the 
> smallest value possible without making *M* greater than or equal to 1.0.
> Each centimal digit of the mantissa is stored in a byte. If the value of the 
> centimal digit is *X* (hence X>=0 and X<=99) then the byte value will be 
> 2*X+1 for every byte of the mantissa, except for the last byte which will be 
> 2*X+0. The mantissa must be the minimum number of bytes necessary to 
> represent the value; trailing *X==0* digits are omitted. This means that the 
> mantissa will never contain a byte with the value 0x00.
> The function could be divided into 4 steps:
>  # Confirm the sign (Negative or not)
>  # Normalise the abs({*}val{*}), transformed to *(M * 100^E)*
>  # Encode *M* by peeling off centimal digit and 
>  # Encode *X*{panel}
> At the step 2, we normalise abs(*val*) and determine the exponent *E*. 
> The current implementation is :
> {code:java}
>     while (abs.compareTo(E32) >= 0 && e <= 350) { abs = 
> abs.movePointLeft(32); e +=16; }
>     while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8); 
> e+= 4; }
>     while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs = 
> abs.movePointLeft(2); e++; }
> {code}
> Which just move the point of abs(*val*), (which is in form yyyy.zzzz, where y 
> & z are digits, e.g. 1000.0001) from right to left step by step, until 
> abs(val) < 1. 
> In the loop body, the increment of e is always the half of steps the point 
> moved to left. (step=32, deltaE=16), (step=8, deltaE=4), (step=2, deltaE=1)
> As e starts from 0, the value of e must be the half of the total moved steps 
> at last.
> So that we could save the above three loops and calculate the moved length 
> and e directly.
> My implementation:
> {code:java}
>     // This is the number of digits of the Integer part of abs(val) when val 
> > 10
>     int integerDigits = abs.precision() - abs.scale();
>     // If length of the Integer part is odd, from the last loop above, we 
> actually moved one more step forward.
>     int lengthToMoveLeft = integerDigits % 2 == 0 ? integerDigits : 
> integerDigits + 1;
>     e = lengthToMoveLeft / 2;
>     // The edge condition.
>     if (e > 350) {
>       e = 351;
>       lengthToMoveLeft = 702;
>     }
>     abs = abs.movePointLeft(lengthToMoveLeft);
> {code}
> In this case, we only move the point once. The time complexity is O(1).
> Same idea to the method of encodeNumericSmall. 
> From the JMH test, (*JmhBenchmark.java & Benchmark.log in attachments*), the 
> throughput has an incredible improvement.
> At step 3 peeling off M into centimals. Current implementation is:
> {code:java}
> for (int i = 0; i < 18 && abs.compareTo(BigDecimal.ZERO) != 0; i++) {
>       abs = abs.movePointRight(2); // Will create new BigDecimal object.
>       d = abs.intValue();
>       dst.put((byte) (2 * d + 1));
>       abs = abs.subtract(BigDecimal.valueOf(d)); // Will create new 
> BigDecimal object.
> }
> {code}
> Also move the point of big decimal step by step. BigDecimal operations are 
> costly and create plenty of new BigDecimal objects, which are only used once. 
> Here I tried to directly encoding the BigDecimal based on the string to avoid 
> BigDecimal operations. 
> My implementation:
> {code:java}
> private static void encodeToCentimal(PositionedByteRange dst, BigDecimal val) 
> {
>     String stringOfAbs = val.stripTrailingZeros().toPlainString();
>     String value = stringOfAbs.substring(stringOfAbs.indexOf('.') + 1);
>     int d;
>     int maxPrecision = Math.min(MAX_NUM_ENCODE_BYTES * 2, value.length());
>     for (int i = 0; i < maxPrecision; i += 2) {
>       d = (value.charAt(i) - '0') * 10;
>       if (i + 1 < maxPrecision) {
>         d += (value.charAt(i + 1) - '0');
>       }
>       dst.put((byte) (2 * d + 1));
>     }
>   }
> {code}
> From the JMH test, (*EncodingBenchmark.java & Benchmark-encoding.log*) this 
> raises the throughtput of the encoding about 200%.
> Have did a jmh benchmark. The result and code are in the attachments. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HBASE-26566) Optimize encodeNumeric in OrderedBytes

Reply via email to