Iterative String concatenation operations in JRuby degrade faster than in MRI.
------------------------------------------------------------------------------
Key: JRUBY-3252
URL: http://jira.codehaus.org/browse/JRUBY-3252
Project: JRuby
Issue Type: Improvement
Components: Core Classes/Modules
Affects Versions: JRuby 1.1.5
Environment: I see this on Solaris x64, but expect that the symptoms
would be the same across different platforms.
Reporter: Prashant Srinivasan
Attachments: string-cat-perf.tar.gz
Iterative String concatenation operations in JRuby seemingly degrade in a less
graceful manner than in MRI. Both of them use the same algorithm to
concatenate, ie. allocate new string s3, copy str1 into s3, copy str2 into s3
at the appropriate location, and return s3[2]. MRI uses memcpy[3] to copy the
input Strings into the final one - memcpy is pretty fast, and I'm reasonably
certain that the compiler generates inline assembly for memcpy in any case,
while JRuby uses System#arraycopy. But arraycopy is implemented natively
too(probably in assembly?), and some profiling[4] revealed that the speed of
the array copy was actually not the time hog, rather, it seems that something
in org.jruby.util.ByteList(or below) is causing the time bloat in the
algorithm. A concatenation program[5] shows the differences in how string
concatenation scales on both the platforms([6],[7]). JRuby execution time
tends to go up rather steeply after an inflection point, while MRI continues to
linearly(at least across this interval) move forward.
I haven't looked into ByteList.java to see what causes the time bloat, and a
source of the problem might really be time consumed for memory allocation(but
probably not GC, since the HPROF profiles should certainly have caught that?)
in the VM.
References:
[2],[3]
MRI Version:
VALUE
rb_str_plus(str1, str2)
VALUE str1, str2;
{
VALUE str3;
StringValue(str2);
str3 = rb_str_new(0, RSTRING(str1)->len+RSTRING(str2)->len);
memcpy(RSTRING(str3)->ptr, RSTRING(str1)->ptr, RSTRING(str1)->len);
memcpy(RSTRING(str3)->ptr + RSTRING(str1)->len,
RSTRING(str2)->ptr, RSTRING(str2)->len);
RSTRING(str3)->ptr[RSTRING(str3)->len] = '\0';
if (OBJ_TAINTED(str1) || OBJ_TAINTED(str2))
OBJ_TAINT(str3);
return str3;
}
JRuby's version:
@JRubyMethod(name = "+", required = 1)
public IRubyObject op_plus(ThreadContext context, IRubyObject other) {
RubyString str = other.convertToString();
ByteList result = new ByteList(value.realSize + str.value.realSize);
result.realSize = value.realSize + str.value.realSize;
System.arraycopy(value.bytes, value.begin, result.bytes, 0, value.realSi
ze);
System.arraycopy(str.value.bytes, str.value.begin, result.bytes, value.r
ealSize, str.value.realSize);
RubyString resultStr = newString(context.getRuntime(), result);
if (isTaint() || str.isTaint()) resultStr.setTaint(true);
return resultStr;
}
[4]
hprof.file-test.longer-run.txt
and
hprof.file-test.txt
in the attachment.
[5]
string-scalability.rb
in the attachment.
[6]
comparison.PNG
in the attachment
[7]
ie., more details on [6] above is at jruby_vs_mri_string_scalability.ods
in the attachment.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email