YuyuZha0 commented on issue #443: Optimize string split methods: 1. Use ThreadLocal to make reuse of th… URL: https://github.com/apache/commons-lang/pull/443#issuecomment-524597564 @kinow Thanks for the carefully reviewing ! Nice weekend, isn't it? I will edit the code later follow your advice. Currently I was on the performance, I've tried more cases, here is the result: ``` Benchmark (arrayLen) Mode Cnt Score Error Units StringSplitBenchmark.testCommonsLang3Split 10 avgt 25 499.547 ± 10.716 ns/op StringSplitBenchmark.testCommonsLang3Split 30 avgt 25 1502.510 ± 16.956 ns/op StringSplitBenchmark.testCommonsLang3Split 50 avgt 25 2467.303 ± 18.970 ns/op StringSplitBenchmark.testFastSplitUtils 10 avgt 25 396.252 ± 4.653 ns/op StringSplitBenchmark.testFastSplitUtils 30 avgt 25 1145.600 ± 5.604 ns/op StringSplitBenchmark.testFastSplitUtils 50 avgt 25 1885.414 ± 4.121 ns/op StringSplitBenchmark.testGuavaSplit 10 avgt 25 565.904 ± 5.483 ns/op StringSplitBenchmark.testGuavaSplit 30 avgt 25 1665.049 ± 81.051 ns/op StringSplitBenchmark.testGuavaSplit 50 avgt 25 2758.394 ± 7.684 ns/op ``` Cases is shown bellow: ``` import com.google.common.base.Splitter; import org.apache.commons.lang3.StringUtils; import org.openjdk.jmh.annotations.*; import java.util.ArrayList; import java.util.List; import java.util.concurrent.ThreadLocalRandom; import java.util.concurrent.TimeUnit; import java.util.function.Supplier; /** * * @author zhaoyuyu * @since 2019-08-21 **/ @OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @Warmup(iterations = 5, time = 5) @Measurement(iterations = 5, time = 5) public class StringSplitBenchmark { private static final char separator = '@'; private static final Splitter splitter = Splitter.on(separator); @Benchmark public String[] testCommonsLang3Split(StringSupplier stringSupplier) { return StringUtils.splitPreserveAllTokens(stringSupplier.get(), separator); } @Benchmark public String[] testFastSplitUtils(StringSupplier stringSupplier) { return FastSplitUtils.splitPreserveAllTokens(stringSupplier.get(), separator); } @Benchmark public String[] testGuavaSplit(StringSupplier supplier) { return splitter.splitToList(supplier.get()).toArray(new String[0]); } @State(Scope.Thread) public static class StringSupplier implements Supplier<String> { @Param({"10", "30", "50"}) private int arrayLen; private String[] array; private int index = 0; @Setup public void setup() { List<String> list = new ArrayList<>(1000); ThreadLocalRandom random = ThreadLocalRandom.current(); for (int i = 0; i < 1000; i++) { String s = StringUtils.join( random.ints(arrayLen).toArray(), separator ); list.add(s); } this.array = list.toArray(new String[0]); } @Override public String get() { if (index >= array.length) index = 0; return array[index++]; } } } ``` The reason why I propose this optimization is that sometimes these methods are really under heavily usage(In my case, I use splitPreserveAllTokens for log processing, the method would be called **billions** of times every day). So for me, the performance is really important. The StringUtils is widely used, any edition must be cautiously, so I fully understand you warring. In computer science, everything would be a trade off, it's really a hard choice.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
