Consider for example type varchar(n).
To enforce the upper limit it is not enough anymore to check for string.length(). One would have to iterate through the string in order to find out how many characters it contains.

- rami

On 27.6.2011 2:38, cowwoc wrote:

Correct: http://stackoverflow.com/questions/1527856/how-can-i-iterate-through-the-unicode-codepoints-of-a-java-string

For what it's worth, I don't find codepoint iteration that much more complicated than char iteration. The main obstacle is to retrofit existing code.

All we need is something like Findbugs to point out which code sniplets need to be retrofitted.

Gili

On 26/06/2011 6:27 PM, Rami Ojares wrote:
Ok, I get it now.
But I am pretty sure trying to support supplementary characters will not be as simple as just using the int methods in Character class. All iterations over character sequences require new logic .. and I am pretty sure that would be much harder than the benefit of supporting the supplementary characters (unless doing big business in china where government demands the support).

"A\uD840\uDC00B".length() returns 4 even though it only has 3 characters because char represents only UTF-16 code unit not a code point that can be uniquely mapped to a character.

So iterating a character sequence looking for whitespace would have to be something like

char[] ca = str.toCharArray();
for(int i=0; i<ca.length; i++) {
  int cp = Character.codePointAt(ca, i);
  if (Character.charCount(cp) == 2) i++;
  if (Character.isWhitespace(cp)) {
    ...
  }
  ...
}

And so on and so forth.
So even though you are right I think supporting supplementary characters all the way might be difficult
since iterating over characters is such a common task.
And usage of supplemental characters so rare.

Just casting a char to int will not make any difference.

- rami

On 27.6.2011 0:33, cowwoc wrote:
Hi Rami,

Yes. See http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ for more information.

Gili

On 26/06/2011 5:32 PM, Rami Ojares wrote:
But those characters can not be represented as chars inside jvm and what about String. Can it contain characters that are not of type char?

- rami

On 27.6.2011 0:27, cowwoc wrote:
Hi Rami,

You're right that nbsp is treated the same by both methods (my mistake!) but H2 should still use the int variant because it accounts for unicode characters that don't fit in 16-bit.

Gili

On 26/06/2011 5:08 PM, Rami Ojares wrote:
On 26.6.2011 23:21, cowwoc wrote:
we should really be using isWhitespace(int) because it is newer/better. In general you're supposed to ignore the methods that take a char parameter.

I don't understand what you mean.
Recent Jdk 1.6 returns the following

Character.isSpaceChar(' ') -> true
Character.isSpaceChar((int) ' ') -> true
Character.isWhitespace(' ') -> false
Character.isWhitespace((int) ' ') -> false

So to me there seems to be no difference between char and int methods.
(Note: The argument character is a nbsp.)

- rami







--
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Reply via email to