Iterative String concatenation operations in JRuby degrade faster than in MRI. ------------------------------------------------------------------------------
Key: JRUBY-3252 URL: http://jira.codehaus.org/browse/JRUBY-3252 Project: JRuby Issue Type: Improvement Components: Core Classes/Modules Affects Versions: JRuby 1.1.5 Environment: I see this on Solaris x64, but expect that the symptoms would be the same across different platforms. Reporter: Prashant Srinivasan Attachments: string-cat-perf.tar.gz Iterative String concatenation operations in JRuby seemingly degrade in a less graceful manner than in MRI. Both of them use the same algorithm to concatenate, ie. allocate new string s3, copy str1 into s3, copy str2 into s3 at the appropriate location, and return s3[2]. MRI uses memcpy[3] to copy the input Strings into the final one - memcpy is pretty fast, and I'm reasonably certain that the compiler generates inline assembly for memcpy in any case, while JRuby uses System#arraycopy. But arraycopy is implemented natively too(probably in assembly?), and some profiling[4] revealed that the speed of the array copy was actually not the time hog, rather, it seems that something in org.jruby.util.ByteList(or below) is causing the time bloat in the algorithm. A concatenation program[5] shows the differences in how string concatenation scales on both the platforms([6],[7]). JRuby execution time tends to go up rather steeply after an inflection point, while MRI continues to linearly(at least across this interval) move forward. I haven't looked into ByteList.java to see what causes the time bloat, and a source of the problem might really be time consumed for memory allocation(but probably not GC, since the HPROF profiles should certainly have caught that?) in the VM. References: [2],[3] MRI Version: VALUE rb_str_plus(str1, str2) VALUE str1, str2; { VALUE str3; StringValue(str2); str3 = rb_str_new(0, RSTRING(str1)->len+RSTRING(str2)->len); memcpy(RSTRING(str3)->ptr, RSTRING(str1)->ptr, RSTRING(str1)->len); memcpy(RSTRING(str3)->ptr + RSTRING(str1)->len, RSTRING(str2)->ptr, RSTRING(str2)->len); RSTRING(str3)->ptr[RSTRING(str3)->len] = '\0'; if (OBJ_TAINTED(str1) || OBJ_TAINTED(str2)) OBJ_TAINT(str3); return str3; } JRuby's version: @JRubyMethod(name = "+", required = 1) public IRubyObject op_plus(ThreadContext context, IRubyObject other) { RubyString str = other.convertToString(); ByteList result = new ByteList(value.realSize + str.value.realSize); result.realSize = value.realSize + str.value.realSize; System.arraycopy(value.bytes, value.begin, result.bytes, 0, value.realSi ze); System.arraycopy(str.value.bytes, str.value.begin, result.bytes, value.r ealSize, str.value.realSize); RubyString resultStr = newString(context.getRuntime(), result); if (isTaint() || str.isTaint()) resultStr.setTaint(true); return resultStr; } [4] hprof.file-test.longer-run.txt and hprof.file-test.txt in the attachment. [5] string-scalability.rb in the attachment. [6] comparison.PNG in the attachment [7] ie., more details on [6] above is at jruby_vs_mri_string_scalability.ods in the attachment. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email