Re: [jibx-users] issue with encoding

2004-07-12 Thread Dennis Sosnoski
I added an input stream wrapper in the CVS code which handles detecting 
and processing the character encoding used for an input document 
supplied as an input stream. It turns out parser support for detecting 
and handling character encodings is optional with XMLPull, which I 
hadn't realized before. The approach I implemented will handle this 
independent of the parser.

I did also confirm and fix one error in the UTF-8 encoding, which 
effects character codes in the 0x800-0x3FFF range.

I'm hoping to avoid yet another release in the beta 3 series, so I'll 
probably just refer users to CVS if they need this support prior to beta 4.

 - Dennis
Dennis Sosnoski wrote:
Actually, I thought I'd noticed an error in the ISO-8859-1 code but on 
further examination it looks good (and works okay in my tests, too).

What *does* appear to be a problem is if you don't specify an encoding 
for an input stream that starts with an XML declaration specifying 
UTF-8 (?xml version=1.0 encoding=UTF-8?). It looks like the 
parser is not correctly interpreting the input in this case. I'm 
investigating further, but thought I'd let people know the story.

If you know you're going to be working with UTF-8 documents, a 
workaround for now is to just specify the encoding when you set the 
input stream. That appears to work properly.

 - Dennis
HD wrote:
Ok I added the bug in the Jira with a simple JUnit testcase. The 
UTF-8 encoding fails with accents. I'm glad you found out the ISO 
issue because I can't reproduce it :-(

Henri.
HD 1meyrxd02-at-sneakemail.com |JiBX| wrote:
WIth UTF-8, it seems like when the XML file is read, the encoding is 
not taken into account and all UTF-8 escape characters are not 
translated backwards...
So I don't get the same bug as ISO-8859-1 but accents are not 
translated back into accents.

Henri.
Dennis Sosnoski dms-at-sosnoski.com |JiBX| wrote:
Actually, the problem I noticed is only for ISO-8859-1 - do you 
also see a problem when using UTF-8?

 - Dennis
Dennis Sosnoski wrote:
I see that there's an error in the encoding handling that I'd 
missed. Most of the test cases are just using ASCII characters, 
though I thought I had a few that went outside the set. I'll get 
it fixed in CVS as soon as I can, and will also add it to the test 
suite. If you can get a simple example code for this and attach it 
to a Jira issue I'll make sure it works properly for your data. 
Thanks,

 - Dennis
HD wrote:
 

I tried to use the UTF-8 and ISO-8859-1 encodings but there seems 
to be some strange things happening with the output encoding: all 
the french accents generate these strange characters.
For instance: rte st Antoine de Ginestière becomes
   CT_Adresserte Saint Antoine de 
GinestiÃ#x0192;Æ#x2019;Ã#x2020;@#x2122;Ã#x0192;@ 
@D¢Ã#x0192;Æ#x2019;@Å¡Ã#x0192;@#x0161;Ã#x201K;¨re/CT_Adresse

This particular string was encoded with ISO-8859-1. But I get 
these strange characters too in UTF-8. I'm wondering what 
encoding was used to compile JiBX ?

Henri.


---
This SF.Net email sponsored by Black Hat Briefings  Training.
Attend Black Hat Briefings  Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
___
jibx-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jibx-users


Re: [jibx-users] issue with encoding

2004-07-07 Thread HD




Ok I added
the bug in the Jira with a simple JUnit testcase. The UTF-8 encoding
fails with accents. I'm glad you found out the ISO issue because I
can't reproduce it :-(

Henri.

HD 1meyrxd02-at-sneakemail.com |JiBX| wrote:

  
  
  WIth UTF-8,
it seems like when the XML file is read, the encoding is not taken into
account and all UTF-8 escape characters are not translated backwards...
So I don't get the same bug as ISO-8859-1 but accents are not
translated back into accents.
  
Henri.
  
Dennis Sosnoski dms-at-sosnoski.com |JiBX| wrote:
  Actually,
the problem I noticed is only for ISO-8859-1 - do you also see a
problem when using UTF-8? 

- Dennis 

Dennis Sosnoski wrote: 

I see that there's an error in the encoding
handling that I'd missed. Most of the test cases are just using ASCII
characters, though I thought I had a few that went outside the set.
I'll get it fixed in CVS as soon as I can, and will also add it to the
test suite. If you can get a simple example code for this and attach it
to a Jira issue I'll make sure it works properly for your data. Thanks,
  
  
- Dennis 
  
HD wrote: 
  
 
  
  I tried to use the UTF-8 and ISO-8859-1
encodings but there seems to be some strange things happening with the
output encoding: all the french accents generate these strange
characters. 
For instance: rte st Antoine de Ginestire becomes 
 CT_Adresserte Saint Antoine de
Ginesti#x0192;#x2019;#x2020;@#x2122;#x0192;@
@D#x0192;#x2019;@#x0192;@#x0161;#x201K;re/CT_Adresse


This particular string was encoded with ISO-8859-1. But I get these
strange characters too in UTF-8. I'm wondering what encoding was used
to compile JiBX ? 

Henri. 

  




--- 
This SF.Net email sponsored by Black Hat Briefings  Training. 
Attend Black Hat Briefings  Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com

___ 
jibx-users mailing list 
[EMAIL PROTECTED]

https://lists.sourceforge.net/lists/listinfo/jibx-users

  





Re: [jibx-users] issue with encoding

2004-07-06 Thread Dennis Sosnoski
I see that there's an error in the encoding handling that I'd missed. 
Most of the test cases are just using ASCII characters, though I thought 
I had a few that went outside the set. I'll get it fixed in CVS as soon 
as I can, and will also add it to the test suite. If you can get a 
simple example code for this and attach it to a Jira issue I'll make 
sure it works properly for your data. Thanks,

 - Dennis
HD wrote:
I tried to use the UTF-8 and ISO-8859-1 encodings but there seems to 
be some strange things happening with the output encoding: all the 
french accents generate these strange characters.
For instance: rte st Antoine de Ginestière becomes
CT_Adresserte Saint Antoine de 
GinestiÃ#x0192;Æ#x2019;Ã#x2020;@#x2122;Ã#x0192;@ @D¢Ã#x0192;Æ#x2019;@Å¡Ã#x0192;@#x0161;Ã#x201K;¨re/CT_Adresse

This particular string was encoded with ISO-8859-1. But I get these 
strange characters too in UTF-8. I'm wondering what encoding was used 
to compile JiBX ?

Henri.


---
This SF.Net email sponsored by Black Hat Briefings  Training.
Attend Black Hat Briefings  Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
___
jibx-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jibx-users


Re: [jibx-users] issue with encoding

2004-07-06 Thread Dennis Sosnoski
Actually, the problem I noticed is only for ISO-8859-1 - do you also see 
a problem when using UTF-8?

 - Dennis
Dennis Sosnoski wrote:
I see that there's an error in the encoding handling that I'd missed. 
Most of the test cases are just using ASCII characters, though I thought 
I had a few that went outside the set. I'll get it fixed in CVS as soon 
as I can, and will also add it to the test suite. If you can get a 
simple example code for this and attach it to a Jira issue I'll make 
sure it works properly for your data. Thanks,

 - Dennis
HD wrote:
 

I tried to use the UTF-8 and ISO-8859-1 encodings but there seems to 
be some strange things happening with the output encoding: all the 
french accents generate these strange characters.
For instance: rte st Antoine de Ginestière becomes
   CT_Adresserte Saint Antoine de 
GinestiÃ#x0192;Æ#x2019;Ã#x2020;@#x2122;Ã#x0192;@ @D¢Ã#x0192;Æ#x2019;@Å¡Ã#x0192;@#x0161;Ã#x201K;¨re/CT_Adresse

This particular string was encoded with ISO-8859-1. But I get these 
strange characters too in UTF-8. I'm wondering what encoding was used 
to compile JiBX ?

Henri.

---
This SF.Net email sponsored by Black Hat Briefings  Training.
Attend Black Hat Briefings  Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
___
jibx-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jibx-users


Re: [jibx-users] issue with encoding

2004-07-06 Thread HD




WIth UTF-8,
it seems like when the XML file is read, the encoding is not taken into
account and all UTF-8 escape characters are not translated backwards...
So I don't get the same bug as ISO-8859-1 but accents are not
translated back into accents.

Henri.

Dennis Sosnoski dms-at-sosnoski.com |JiBX| wrote:
Actually,
the problem I noticed is only for ISO-8859-1 - do you also see a
problem when using UTF-8?
  
  
- Dennis
  
  
Dennis Sosnoski wrote:
  
  
  I see that there's an error in the encoding
handling that I'd missed. Most of the test cases are just using ASCII
characters, though I thought I had a few that went outside the set.
I'll get it fixed in CVS as soon as I can, and will also add it to the
test suite. If you can get a simple example code for this and attach it
to a Jira issue I'll make sure it works properly for your data. Thanks,


- Dennis


HD wrote:





I tried to use the UTF-8 and ISO-8859-1
encodings but there seems to be some strange things happening with the
output encoding: all the french accents generate these strange
characters.
  
For instance: rte st Antoine de Ginestire becomes
  
 CT_Adresserte Saint Antoine de
Ginesti#x0192;#x2019;#x2020;@#x2122;#x0192;@
@D#x0192;#x2019;@#x0192;@#x0161;#x201K;re/CT_Adresse
  
  
This particular string was encoded with ISO-8859-1. But I get these
strange characters too in UTF-8. I'm wondering what encoding was used
to compile JiBX ?
  
  
Henri.
  
  

  
  
  
  
---
  
This SF.Net email sponsored by Black Hat Briefings  Training.
  
Attend Black Hat Briefings  Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
  
___
  
jibx-users mailing list
  
[EMAIL PROTECTED]
  
https://lists.sourceforge.net/lists/listinfo/jibx-users