shuttie commented on issue #10358: [FLINK-14346] [serialization] faster 
implementation of StringValue writeString and readString
URL: https://github.com/apache/flink/pull/10358#issuecomment-561581049
 
 
   @AHeise thanks for all the ideas, I've updated the PR with all the proposals 
applied. 
   
   As for `writeString` fallback code, I've found a better way of dealing with 
short strings, not requiring a separate code path. If you stare long enough in 
the jmh perfasm listing for short strings, you may notice that most of the time 
(compared with the original implementation) is spent within initial buffer size 
computation. In the original unbuffered code there is no reason to compute it, 
as there is no buffer. But in this PR we need to scan a string twice: to 
compute the buffer size, and then to write characters to the buffer.
   
   Main idea of this PR is to leverage CPU-level parallelism, helping it to 
process multiple characters at once. But the problem with short strings is that 
there is nothing to parallelize, so double-scanning overhead starts to kill the 
performance.
   
   The proposed fix is to over-allocate the buffer for short strings, skipping 
the exact buffer size computation. I've found a tipping point for this approach 
laying somewhere between 6-8 characters:
   * for strings < 6 chars it's faster to overallocate,
   * for strings of 6-8 chars it's the same as exact computation,
   * for strings > 8 chars it can be slower, but insignificantly. But in theory 
it may produce some GC pressure.
   
   The current round of benchmarks:
   ```
   [info] Benchmark                                    (length)  (stringType)  
Mode  Cnt   Score   Error  Units
   [info] StringDeserializerBenchmark.deserializeDefault          1         
ascii  avgt   50   45.618 ± 0.339  ns/op
   [info] StringDeserializerBenchmark.deserializeDefault          2         
ascii  avgt   50   61.348 ± 0.579  ns/op
   [info] StringDeserializerBenchmark.deserializeDefault          4         
ascii  avgt   50   88.067 ± 1.058  ns/op
   [info] StringDeserializerBenchmark.deserializeDefault          8         
ascii  avgt   50  142.902 ± 1.121  ns/op
   [info] StringDeserializerBenchmark.deserializeDefault         16         
ascii  avgt   50  249.181 ± 1.920  ns/op
   [info] StringDeserializerBenchmark.deserializeDefault         32         
ascii  avgt   50  466.382 ± 1.502  ns/op
   [info] StringDeserializerBenchmark.deserializeImproved         1         
ascii  avgt   50   49.916 ± 0.132  ns/op
   [info] StringDeserializerBenchmark.deserializeImproved         2         
ascii  avgt   50   50.278 ± 0.064  ns/op
   [info] StringDeserializerBenchmark.deserializeImproved         4         
ascii  avgt   50   50.365 ± 0.129  ns/op
   [info] StringDeserializerBenchmark.deserializeImproved         8         
ascii  avgt   50   52.463 ± 0.301  ns/op
   [info] StringDeserializerBenchmark.deserializeImproved        16         
ascii  avgt   50   55.711 ± 0.597  ns/op
   [info] StringDeserializerBenchmark.deserializeImproved        32         
ascii  avgt   50   65.342 ± 0.555  ns/op
   [info] StringSerializerBenchmark.serializeDefault              1         
ascii  avgt   50   31.076 ± 0.192  ns/op
   [info] StringSerializerBenchmark.serializeDefault              2         
ascii  avgt   50   31.770 ± 1.811  ns/op
   [info] StringSerializerBenchmark.serializeDefault              4         
ascii  avgt   50   39.251 ± 0.189  ns/op
   [info] StringSerializerBenchmark.serializeDefault              8         
ascii  avgt   50   57.736 ± 0.253  ns/op
   [info] StringSerializerBenchmark.serializeDefault             16         
ascii  avgt   50   94.964 ± 0.514  ns/op
   [info] StringSerializerBenchmark.serializeDefault             32         
ascii  avgt   50  168.754 ± 1.416  ns/op
   [info] StringSerializerBenchmark.serializeImproved             1         
ascii  avgt   50   30.145 ± 0.156  ns/op
   [info] StringSerializerBenchmark.serializeImproved             2         
ascii  avgt   50   30.873 ± 0.274  ns/op
   [info] StringSerializerBenchmark.serializeImproved             4         
ascii  avgt   50   31.993 ± 0.276  ns/op
   [info] StringSerializerBenchmark.serializeImproved             8         
ascii  avgt   50   46.220 ± 0.211  ns/op
   [info] StringSerializerBenchmark.serializeImproved            16         
ascii  avgt   50   50.856 ± 0.826  ns/op
   [info] StringSerializerBenchmark.serializeImproved            32         
ascii  avgt   50   63.221 ± 1.130  ns/op
   ```
   So for large strings the new implementation is much faster, and for short 
it's not regressing (and even slightly faster).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to