String#split slower than 1.9
----------------------------
Key: JRUBY-5618
URL: http://jira.codehaus.org/browse/JRUBY-5618
Project: JRuby
Issue Type: Improvement
Components: Performance
Affects Versions: JRuby 1.6
Reporter: Charles Oliver Nutter
Given this benchmark (from Rubinius's benchmark suite):
{noformat}
require 'benchmark'
require 'benchmark/ips'
Benchmark.ips do |x|
string =
"aaaa|bbbbbbbbbbbbbbbbbbbbbbbbbbbb|cccccccccccccccccccccccccccccccccccc|dd|eeeeeeeeeeeeeeeeeeeeeeeeeeeeeee|ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff|gggggggggggggggggggggggggggggggggggggggggggggggggggg|hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh|i|j|k|l|m|n|ooooooooooooooooooooooooo|"
x.report "string split, matching delimiter" do |times|
i = 0
while i < times
string.split('|')
i += 1
end
end
x.report "string split, mismatched delimiter" do |times|
i = 0
while i < times
string.split('.')
i += 1
end
end
end
{noformat}
We appear to lag behind 1.9 for perf on the second example. Our logic *does*
largely match MRI's too, so there's something causing extra overhead for us.
Perhaps it is the local scope overhead (split forces a heap scope), or perhaps
we're creating too many objects within the split logic.
Here's results running on my machine:
{noformat}
~/projects/rubinius ➔ bin/rbx -I benchmark/lib/
benchmark/core/string/bench_split.rb
string split, matching delimiter
156038.3 (±1.8%) i/s - 785571 in 5.036174s
(cycle=8447)
string split, mismatched delimiter
370903.4 (±5.6%) i/s - 1844830 in 4.994555s
(cycle=14191)
~/projects/rubinius ➔ jruby -J-d64 --server -I benchmark/lib/
benchmark/core/string/bench_split.rb
string split, matching delimiter
150389.4 (±7.9%) i/s - 743680 in 5.000700s
(cycle=5810)
string split, mismatched delimiter
225853.3 (±2.0%) i/s - 1129383 in 5.002731s
(cycle=13943)
~/projects/rubinius ➔ ruby1.9 -I benchmark/lib/
benchmark/core/string/bench_split.rb
string split, matching delimiter
112416.4 (±2.8%) i/s - 567063 in 5.048365s
(cycle=9001)
string split, mismatched delimiter
506971.1 (±3.6%) i/s - 2538240 in 5.013253s
(cycle=26440)
{noformat}
Note that Rubinius does not use a regexp at all for the splits in question,
instead manually walking the string looking for matches. The implications of
this, especially wrt 1.9 and Encoding, are not clear...but it obviously helps
them perf-wise.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email