[
https://issues.apache.org/jira/browse/HBASE-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466188#comment-17466188
]
Hudson commented on HBASE-26566:
--------------------------------
Results for branch branch-2
[build #426 on
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/426/]:
(/) *{color:green}+1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/426/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2)
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/426/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]
(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/426/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/426/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> Optimize encodeNumeric in OrderedBytes
> --------------------------------------
>
> Key: HBASE-26566
> URL: https://issues.apache.org/jira/browse/HBASE-26566
> Project: HBase
> Issue Type: Task
> Components: Performance
> Reporter: Yutong Xiao
> Assignee: Yutong Xiao
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
> Attachments: Benchmark-encoding.log, Benchmark.log,
> EncodingBenchmark.java, JmhBenchmark.java
>
>
> Found current estimate E step in OrderedBytes is to search step by step in
> loops. We can directly calculate the length to move the point and let the
> time complexity become to O(1).
> From the comments of the method encodeNumericLarge:
> {panel:title=encodeNumericLarge}
> Encode the large magnitude floating point number *val* using the key
> encoding. The caller guarantees that *val* will be
> finite and abs({*}val{*}) >= 1.0.
> A floating point value is encoded as an integer exponent *E* and a mantissa
> {*}M{*}. The original value is equal to {*}(M * 100^E){*}. *E* is set to the
> smallest value possible without making *M* greater than or equal to 1.0.
> Each centimal digit of the mantissa is stored in a byte. If the value of the
> centimal digit is *X* (hence X>=0 and X<=99) then the byte value will be
> 2*X+1 for every byte of the mantissa, except for the last byte which will be
> 2*X+0. The mantissa must be the minimum number of bytes necessary to
> represent the value; trailing *X==0* digits areĀ omitted. This means that the
> mantissa will never contain a byte with the value 0x00.
> The function could be divided into 4 steps:
> # Confirm the sign (Negative or not)
> # Normalise the abs({*}val{*}), transformed to *(M * 100^E)*
> # Encode *M* by peeling off centimal digit and
> # Encode *X*{panel}
> At the step 2, we normalise abs(*val*) and determine the exponent *E*.
> The current implementation is :
> {code:java}
> while (abs.compareTo(E32) >= 0 && e <= 350) { abs =
> abs.movePointLeft(32); e +=16; }
> while (abs.compareTo(E8) >= 0 && e <= 350) { abs = abs.movePointLeft(8);
> e+= 4; }
> while (abs.compareTo(BigDecimal.ONE) >= 0 && e <= 350) { abs =
> abs.movePointLeft(2); e++; }
> {code}
> Which just move the point of abs(*val*), (which is in form yyyy.zzzz, where y
> & z are digits, e.g. 1000.0001) from right to left step by step, until
> abs(val) < 1.
> In the loop body, the increment of e is always the half of steps the point
> moved to left. (step=32, deltaE=16), (step=8, deltaE=4), (step=2, deltaE=1)
> As e starts from 0, the value of e must be the half of the total moved steps
> at last.
> So that we could save the above three loops and calculate the moved length
> and e directly.
> My implementation:
> {code:java}
> // This is the number of digits of the Integer part of abs(val) when val
> > 10
> int integerDigits = abs.precision() - abs.scale();
> // If length of the Integer part is odd, from the last loop above, we
> actually moved one more step forward.
> int lengthToMoveLeft = integerDigits % 2 == 0 ? integerDigits :
> integerDigits + 1;
> e = lengthToMoveLeft / 2;
> // The edge condition.
> if (e > 350) {
> e = 351;
> lengthToMoveLeft = 702;
> }
> abs = abs.movePointLeft(lengthToMoveLeft);
> {code}
> In this case, we only move the point once. The time complexity is O(1).
> Same idea to the method of encodeNumericSmall.
> From the JMH test, (*JmhBenchmark.java & Benchmark.log in attachments*), the
> throughput has an incredible improvement.
> At step 3 peeling off M into centimals. Current implementation is:
> {code:java}
> for (int i = 0; i < 18 && abs.compareTo(BigDecimal.ZERO) != 0; i++) {
> abs = abs.movePointRight(2); // Will create new BigDecimal object.
> d = abs.intValue();
> dst.put((byte) (2 * d + 1));
> abs = abs.subtract(BigDecimal.valueOf(d)); // Will create new
> BigDecimal object.
> }
> {code}
> Also move the point of big decimal step by step. BigDecimal operations are
> costly and create plenty of new BigDecimal objects, which are only used once.
> Here I tried to directly encoding the BigDecimal based on the string to avoid
> BigDecimal operations.
> My implementation:
> {code:java}
> private static void encodeToCentimal(PositionedByteRange dst, BigDecimal val)
> {
> String stringOfAbs = val.stripTrailingZeros().toPlainString();
> String value = stringOfAbs.substring(stringOfAbs.indexOf('.') + 1);
> int d;
> int maxPrecision = Math.min(MAX_NUM_ENCODE_BYTES * 2, value.length());
> for (int i = 0; i < maxPrecision; i += 2) {
> d = (value.charAt(i) - '0') * 10;
> if (i + 1 < maxPrecision) {
> d += (value.charAt(i + 1) - '0');
> }
> dst.put((byte) (2 * d + 1));
> }
> }
> {code}
> From the JMH test, (*EncodingBenchmark.java & Benchmark-encoding.log*) this
> raises the throughtput of the encoding about 200%.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)