I mean that I want example code which shows good pattern on dealing
multibyte string :-) For example, I'm not sure whether this code is good
or not:
str _ UnicodeString fromString: 'Some UTF-8 Encoded String'.
It is if your default encoding is UTF-8, or if the encoded string
includes a byte-order mark (for this, you need the attached patch :-( ...).
For example, this works:
st> #[254 255 200 4 193 49 201 196] asString encoding!
'UTF-16BE'
str _ UnicodeString fromString: 'Some UTF-8 Encoded String' encoding:
UTF8StringEncoding.
UTF8StringEncoding is written 'UTF-8'.
Paolo
* auto-adding [EMAIL PROTECTED]/smalltalk--devo--2.2--patch-152 to greedy
revision library /Users/bonzinip/Archives/revlib
* found immediate ancestor revision in library ([EMAIL
PROTECTED]/smalltalk--devo--2.2--patch-151)
* patching for this revision ([EMAIL PROTECTED]/smalltalk--devo--2.2--patch-152)
--- orig/i18n/Sets.st
+++ mod/i18n/Sets.st
@@ -1289,21 +1289,21 @@ encoding
default locale's default charset"
| encoding |
- (self size >= 4 and: [ (self at: 1) = 0 and: [ (self at: 2) = 0 and: [
- (self at: 3) = 254 and: [
- (self at: 4) = 255 ]]]]) ifTrue: [ ^'UTF-32BE' ].
- (self size >= 4 and: [ (self at: 4) = 0 and: [ (self at: 3) = 0 and: [
- (self at: 2) = 254 and: [
- (self at: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
+ (self size >= 4 and: [ (self valueAt: 1) = 0 and: [ (self valueAt: 2) = 0
and: [
+ (self valueAt: 3) = 254 and: [
+ (self valueAt: 4) = 255 ]]]]) ifTrue: [ ^'UTF-32BE' ].
+ (self size >= 4 and: [ (self valueAt: 4) = 0 and: [ (self valueAt: 3) = 0
and: [
+ (self valueAt: 2) = 254 and: [
+ (self valueAt: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
(self size >= 2 and: [
- (self at: 1) = 254 and: [
- (self at: 2) = 255 ]]) ifTrue: [ ^'UTF-16BE' ].
+ (self valueAt: 1) = 254 and: [
+ (self valueAt: 2) = 255 ]]) ifTrue: [ ^'UTF-16BE' ].
(self size >= 2 and: [
- (self at: 2) = 254 and: [
- (self at: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
- (self size >= 3 and: [ (self at: 1) = 16rEF and: [
- (self at: 2) = 16rBB and: [
- (self at: 3) = 16rBF ]]]) ifTrue: [ ^'UTF-8' ].
+ (self valueAt: 2) = 254 and: [
+ (self valueAt: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
+ (self size >= 3 and: [ (self valueAt: 1) = 16rEF and: [
+ (self valueAt: 2) = 16rBB and: [
+ (self valueAt: 3) = 16rBF ]]]) ifTrue: [ ^'UTF-8' ].
encoding := self class defaultEncoding.
encoding asString = 'UTF-16' ifTrue: [ ^self utf16Encoding ].
@@ -1314,9 +1314,9 @@ utf32Encoding
"Assuming the receiver is encoded as UTF-16 with a proper
endianness marker, answer the correct encoding of the receiver."
- (self size >= 4 and: [ (self at: 4) = 0 and: [ (self at: 3) = 0 and: [
- (self at: 2) = 254 and: [
- (self at: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
+ (self size >= 4 and: [ (self valueAt: 4) = 0 and: [ (self valueAt: 3) = 0
and: [
+ (self valueAt: 2) = 254 and: [
+ (self valueAt: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
^'UTF-32BE'
!
@@ -1325,8 +1325,8 @@ utf16Encoding
endianness marker, answer the correct encoding of the receiver."
(self size >= 2 and: [
- (self at: 2) = 254 and: [
- (self at: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
+ (self valueAt: 2) = 254 and: [
+ (self valueAt: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
^'UTF-16BE'
! !
_______________________________________________
help-smalltalk mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-smalltalk