[
https://issues.apache.org/jira/browse/ABDERA-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525262
]
Chris Berry commented on ABDERA-60:
-----------------------------------
We figured it out. AFAICT, both my issue and Herbert's are the same.
I believe this is a bug in Abdera.
There are actually two issues;
-----------------------
First , Abdera uses HttpClient's
method.getResponseBodyAsStream();
in order to obtain a raw stream bytes for Woodstox. (which is the correct thing
to do for performance)
But Woodstox does NOT assume UTF-8. So it fails when parsing valid UTF-8
characters.
The fix is to change the following line in AbstractClientResponse
public <T extends Element>Document<T> getDocument( Parser parser,
ParserOptions options)
throws ParseException {
try {
.......
// Document<T> doc = parser.parse( getInputStream(), base, options);
Document<T> doc = parser.parse(getReader(), base, options);
....
And to add the following method to AbstractClientResponse
public java.io.Reader getReader() throws java.io.IOException {
String header = getHeader("Content-Type");
String type = "UTF-8"; // default to UTF-8
java.util.regex.Matcher matcher =
java.util.regex.Pattern.compile(".*charset\\s*\\=\\s*(\\S+).*").matcher(header);
if (matcher.matches()) {
System.out.println("@@@@@@@@@@@@@@@@@@@@@@ type = " + type);
type = matcher.group(1);
}
return new java.io.InputStreamReader(getInputStream(), type);
}
Although, there is likely a cleaner way to get the "charset" param in Abdera??
-----------------------------
Second, Abdera is NOT adding the "charset" parameter (e.g. ";charset=utf-8" )
to the Content-Type HTTP Header of the Response
So a fix might be to change the following line in BaseResponseContext::
public BaseResponseContext(T base, boolean chunked) {
this.base = base;
setStatus(200);
setStatusText("OK");
this.chunked = chunked;
try {
// setContentType(getContentType().toString());
setContentType(getContentType().toString() + "; charset=utf-8");
} catch (Exception e) {}
}
Although there are likely better ways/places to accomplish this within Abdera.
Perhaps I need to set this in my SpringAbderaServlet??
> Invalid UTF-8 chars in the AbderaClient
> ---------------------------------------
>
> Key: ABDERA-60
> URL: https://issues.apache.org/jira/browse/ABDERA-60
> Project: Abdera
> Issue Type: Bug
> Affects Versions: 0.3.0
> Environment: N/A
> Reporter: Chris Berry
> Fix For: 0.3.0
>
> Attachments: abdera-utf8-bug.tar.gz
>
>
> After upgrading to the latest 0.3-SNAPSHOT SVN trunk (on ~8/27/2007)) from a
> 0.3-SNAPSHOT download from a couple of months ago
> And after making all required modifications (to catch up with all the API
> changes), I am seeing "Invalid UTF-8"
> Note that these errors only occur in the AbderaClient when I call
> "entry.getContent()"
> I have attached a small, self-contained JUnit test case which
> reproduces/demonstrates this issue.
> It runs and builds out-of-the-box (using mvn install).
> There is also a README.txt that details the output/issue
> This JUnit reproduces the error. It is as small as I could get it.
> My Atom Store is based on a Store and StoreProvider (based on code I received
> from Ugo Cei as a starting point)
> Note that all of the code in src/main/java is relatively fixed between the
> latest 0.3-SNAPSHOT and the 0.3-SNAPSHOT that works
> In other words, my code stayed as fixed as possible, and the latest
> 0.3-SNAPSHOT is the only real variable
> I'm not saying that the bug isn't in my code, Only that it never showed up
> until my upgrade to 0.3-SNAPSHOT.
> I actually suspect that it may be an issue w/ woodstox, which the latest
> 0.3-SNAPSHOT significantly upgrades.
> Note: I have looked very closely at the XML file(s) that is causing this
> issue.
> I used the Unix util; "iconv" on them. And AFAICT they do not contain
> improper UTF-8.
> Chris Berry
> chriswberry at gmail dot com
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.