[jira] [Commented] (DERBY-5068) Investigate increased CPU usage on client after introduction of UTF-8 CcsidManager

Knut Anders Hatlen (JIRA) Thu, 12 May 2011 10:38:30 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032532#comment-13032532
 ]


Knut Anders Hatlen commented on DERBY-5068:
-------------------------------------------

Thanks for looking at the patch, Dag. I'm still learning the API myself. :)

You're probably right that we should handle those conditions. I'm not sure how 
unmappable-character errors can happen with UTF-8, but malformed-input errors 
seem to be raised for characters in the range \uD800 to \uDFFF.

We have two alternatives:

1) Make the CharsetEncoder replace problematic characters with '?' instead of 
reporting an error. (By calling onMalformedInput() and onUnmappableCharacter() 
with CodingErrorAction.REPLACE.)

2) Detect and report the conditions. (By checking the CoderResult and raising 
an exception.)

Option 2 sounds like the right thing to do. However, the original code used 
String.getBytes(String) to do the encoding, which implements option 1 (the API 
javadoc says that it's unspecified what it does when it cannot encode the 
string, but its actual behaviour matches option 1). Also, we still have the 
convertFromJavaString(String,Agent) method which matches option 1.

On the other hand, all the encoding methods in EbcdicCcsidManager do raise an 
exception if the string contains characters not in the EBCDIC range, so there's 
no clear precedence. I guess no matter what we choose to do, we should make all 
these methods consistent. I think my preference would be option 2.

> Investigate increased CPU usage on client after introduction of UTF-8 
> CcsidManager
> ----------------------------------------------------------------------------------
>
>                 Key: DERBY-5068
>                 URL: https://issues.apache.org/jira/browse/DERBY-5068
>             Project: Derby
>          Issue Type: Task
>    Affects Versions: 10.7.1.1
>            Reporter: Knut Anders Hatlen
>         Attachments: d5068-1a.diff, d5068-2a.diff, d5068-2a.stat
>
>
> While looking at the performance graphs for the single-record select test 
> during the last year - 
> http://home.online.no/~olmsan/derby/perf/select_1y.html - I noticed that 
> there was a significant increase (10-20%) in CPU usage per transaction on the 
> client early in October 2010. To be precise, the increase seems to have 
> happened between revision 1004381 and revision 1004794. In that period, there 
> were three commits: two related to DERBY-4757, and one related to DERBY-4825 
> (tests only).
> We should try to find out what's causing the increased CPU usage and see if 
> there's some way to reduce it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (DERBY-5068) Investigate increased CPU usage on client after introduction of UTF-8 CcsidManager

Reply via email to