[ 
https://issues.apache.org/jira/browse/LANG-1576?focusedWorklogId=463404&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-463404
 ]

ASF GitHub Bot logged work on LANG-1576:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jul/20 20:36
            Start Date: 26/Jul/20 20:36
    Worklog Time Spent: 10m 
      Work Description: aherbert commented on a change in pull request #565:
URL: https://github.com/apache/commons-lang/pull/565#discussion_r460568874



##########
File path: src/test/java/org/apache/commons/lang3/StringUtilsChompTest.java
##########
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.lang3;
+
+import java.util.concurrent.TimeUnit;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.State;
+import static org.apache.commons.lang3.StringUtils.EMPTY;
+import static org.apache.commons.lang3.StringUtils.isEmpty;
+import static org.apache.commons.lang3.StringUtils.length;
+
+/**
+ * Test to show whether using BitSet for removeAll() methods is faster than 
using HashSet.

Review comment:
       This is not what the benchmark is doing.

##########
File path: src/test/java/org/apache/commons/lang3/StringUtilsChompTest.java
##########
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.lang3;
+
+import java.util.concurrent.TimeUnit;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.State;
+import static org.apache.commons.lang3.StringUtils.EMPTY;
+import static org.apache.commons.lang3.StringUtils.isEmpty;
+import static org.apache.commons.lang3.StringUtils.length;
+
+/**
+ * Test to show whether using BitSet for removeAll() methods is faster than 
using HashSet.
+ */
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.NANOSECONDS)
+@State(Scope.Thread)
+public class StringUtilsChompTest {
+
+    private static final String string1 = "\r";
+    private static final String string2 = buildString2();
+
+    private static String buildString2() {
+        StringBuilder stringBuilder = new StringBuilder();
+        for (int i = 0; i < 100000; i++) {
+            stringBuilder.append('0');
+            stringBuilder.append('\r');
+            stringBuilder.append('0');
+            stringBuilder.append('\n');
+            stringBuilder.append('0');
+            stringBuilder.append('\r');
+            stringBuilder.append('\n');
+        }
+        return stringBuilder.toString();
+    }
+
+    @Benchmark
+    public void test1Old() {
+        chompOld(string1);

Review comment:
       You should return String from all your benchmark methods. Otherwise 
there is nothing to prevent the JVM from not running your chomp methods as the 
result is discarded. If you return the chomped String then the JMH framework 
will consume it thus forcing the JVM to run the method.

##########
File path: src/test/java/org/apache/commons/lang3/StringUtilsChompTest.java
##########
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.lang3;
+
+import java.util.concurrent.TimeUnit;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.State;
+import static org.apache.commons.lang3.StringUtils.EMPTY;
+import static org.apache.commons.lang3.StringUtils.isEmpty;
+import static org.apache.commons.lang3.StringUtils.length;
+
+/**
+ * Test to show whether using BitSet for removeAll() methods is faster than 
using HashSet.
+ */
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.NANOSECONDS)
+@State(Scope.Thread)
+public class StringUtilsChompTest {
+
+    private static final String string1 = "\r";
+    private static final String string2 = buildString2();
+
+    private static String buildString2() {
+        StringBuilder stringBuilder = new StringBuilder();
+        for (int i = 0; i < 100000; i++) {
+            stringBuilder.append('0');
+            stringBuilder.append('\r');
+            stringBuilder.append('0');
+            stringBuilder.append('\n');
+            stringBuilder.append('0');
+            stringBuilder.append('\r');
+            stringBuilder.append('\n');
+        }
+        return stringBuilder.toString();
+    }
+
+    @Benchmark
+    public void test1Old() {
+        chompOld(string1);
+    }
+
+    @Benchmark
+    public void test1New() {
+        chompNew(string1);
+    }
+
+    @Benchmark
+    public void test2Old() {
+        chompOld(string2);

Review comment:
       These test2 method use a 100,000 * 7 character string but only chomp it 
once. So you are not really testing anything but a chomp of a CR LF pair from 
the end of a 700,000 character string. So the benchmark time difference (from 
test1) is the time is for hitting a different memory cache level, and the 
runtime difference of the single length input string code path used in test1.
   
   Note you cannot continuously chomp the string in a loop as the method will 
get stuck at the '0' character and not shorten it. This test is measuring 
performance of the method, but perhaps not in the way you intended.
   
   Ideally you should have test for strings ending with:
   
   - CR
   - LF
   - CR + LF
   - None of the above
   
   With and without some preceding characters to hit the length 1 code path.
   
   You then test the speed through all the execution paths.
   
   If you want to avoid the CPU pipeline branch prediction from learning the 
branch to take for each of your code paths then you should build the different 
strings and chomp all of them in a single loop. Branch prediction will fall 
down when the size of the data is >2000 so you would need maybe 10000 random 
ending strings. These should be consumed by a JMH Blackhole to avoid the JVM 
from optimising out the calling of a no-effect method:
   
   ```
   String[] data = ...;
   
   @Benchmark
   public void testNew(Blackhole bh) {
       for (final String s : data) {
           bh.consume(chompNew(s));
       }
   }
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 463404)
    Time Spent: 1h  (was: 50m)

> refine StringUtils.chomp
> ------------------------
>
>                 Key: LANG-1576
>                 URL: https://issues.apache.org/jira/browse/LANG-1576
>             Project: Commons Lang
>          Issue Type: Sub-task
>            Reporter: Jin Xu
>            Priority: Minor
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/commons-lang/pull/565]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to