String#split slower than 1.9
----------------------------

                 Key: JRUBY-5618
                 URL: http://jira.codehaus.org/browse/JRUBY-5618
             Project: JRuby
          Issue Type: Improvement
          Components: Performance
    Affects Versions: JRuby 1.6
            Reporter: Charles Oliver Nutter


Given this benchmark (from Rubinius's benchmark suite):

{noformat}
require 'benchmark'
require 'benchmark/ips'

Benchmark.ips do |x|
  string = 
"aaaa|bbbbbbbbbbbbbbbbbbbbbbbbbbbb|cccccccccccccccccccccccccccccccccccc|dd|eeeeeeeeeeeeeeeeeeeeeeeeeeeeeee|ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff|gggggggggggggggggggggggggggggggggggggggggggggggggggg|hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh|i|j|k|l|m|n|ooooooooooooooooooooooooo|"

  x.report "string split, matching delimiter" do |times|
    i = 0
    while i < times
      string.split('|')
      i += 1
    end
  end

  x.report "string split, mismatched delimiter" do |times|
    i = 0
    while i < times
      string.split('.')
      i += 1
    end
  end
end
{noformat}

We appear to lag behind 1.9 for perf on the second example. Our logic *does* 
largely match MRI's too, so there's something causing extra overhead for us. 
Perhaps it is the local scope overhead (split forces a heap scope), or perhaps 
we're creating too many objects within the split logic.

Here's results running on my machine:

{noformat}
~/projects/rubinius &#10132; bin/rbx -I benchmark/lib/ 
benchmark/core/string/bench_split.rb 
string split, matching delimiter
                       156038.3 (±1.8%) i/s -     785571 in   5.036174s 
(cycle=8447)
string split, mismatched delimiter
                       370903.4 (±5.6%) i/s -    1844830 in   4.994555s 
(cycle=14191)

~/projects/rubinius &#10132; jruby -J-d64 --server -I benchmark/lib/ 
benchmark/core/string/bench_split.rb 
string split, matching delimiter
                       150389.4 (±7.9%) i/s -     743680 in   5.000700s 
(cycle=5810)
string split, mismatched delimiter
                       225853.3 (±2.0%) i/s -    1129383 in   5.002731s 
(cycle=13943)

~/projects/rubinius &#10132; ruby1.9 -I benchmark/lib/ 
benchmark/core/string/bench_split.rb 
string split, matching delimiter
                       112416.4 (±2.8%) i/s -     567063 in   5.048365s 
(cycle=9001)
string split, mismatched delimiter
                       506971.1 (±3.6%) i/s -    2538240 in   5.013253s 
(cycle=26440)
{noformat}

Note that Rubinius does not use a regexp at all for the splits in question, 
instead manually walking the string looking for matches. The implications of 
this, especially wrt 1.9 and Encoding, are not clear...but it obviously helps 
them perf-wise.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to