On Wed, 15 Jun 2005, Roland Weber wrote:
> Hello Andrew,
>
> 1. Use HttpMethod.getResponseBodyAsStream().
Yes, I do it now.
> 2. Are you sure that the question marks are actually in the
> string? It could be that they appear only when you try to
> *print* the string.
it is 0x3F ASCII code, so it's ?-sign.
>
> 3. If the question marks really are in the string, the server
> probably sent an inappropriate charset value, or none
> at all. Anyway, it's better to do 1) and parse the HTML
> code for a charset specification. You'll have to parse
> it anyway in your robot.
Server sends Shift_JIS as page charset.
it's my code now:
............
result = new HttpResponse ( method.getResponseBodyAsStream (),
method.getResponseCharSet() );
.........
//in HttpResponse constructor:
HttpResponse ( InputStream responseBodyAsStream, String charset ) throws
IOException {
BufferedReader reader = new BufferedReader ( new InputStreamReader (
responseBodyAsStream, charset ) );
String line = null;
while ( ( line = reader.readLine() ) != null ) {
this.add( line );
out.write( line );
out.write( "\n" );
}
}
It works. :)
It's funny, but
http://jakarta.apache.org/commons/httpclient/3.0/charencodings.html
says: "If the response is known to be a String, you can use the
getResponseBodyAsString method which will automatically use the encoding
specified in the Content-Type header or ISO-8859-1 if no charset is
specified."
Content-Type for this page is "text/html; charset=Shift_JIS", I realy
thought that httpclient autocovert body... :(
>
> hope that helps,
> Roland
>
>
>
>
>
> "Andrew A. Sabitov" <[EMAIL PROTECTED]>
> 15.06.2005 06:20
> Please respond to
> "HttpClient User Discussion"
>
>
> To
> [email protected]
> cc
>
> Subject
> Japanese charset?
>
>
>
>
>
>
>
> Hi all!
>
> Could anybody be so kind to help me? I should to make a robot, that will
> fetch some data from amazon.co.jp. It will work under Linux.
>
>
> This URL is a point of start for me:
> http://s1.amazon.co.jp/exec/varzea/subst/your-account/downloadable-reports.html
>
>
> There is a class code that downloads page below. The problem is that
> method.getResponseBodyAsString() returns string, where all Japanese chars
> replaced by question-mark.
>
> How can I fix this problem?
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> import java.io.FileWriter;
> import java.io.IOException;
>
> import org.apache.commons.httpclient.Cookie;
> import org.apache.commons.httpclient.HostConfiguration;
> import org.apache.commons.httpclient.HttpConnection;
> import org.apache.commons.httpclient.HttpException;
> import org.apache.commons.httpclient.HttpState;
> import org.apache.commons.httpclient.HttpStatus;
> import org.apache.commons.httpclient.URI;
> import org.apache.commons.httpclient.protocol.Protocol;
> import org.apache.commons.httpclient.cookie.CookiePolicy;
> import org.apache.commons.httpclient.methods.GetMethod;
>
> import ru.pp.sabitov.common.HttpResponse;
>
> public class Client {
>
> private String url = null;
>
> private HttpConnection connection = null;
> private Cookie[] cookies = null;
>
> private String proxyHost = null;
> private int proxyPort = -1;
>
> public Client () {
>
> public void setProxy ( String host, String port ) {
>
> public void setProxy ( String host, int port ) {
>
> public HttpResponse openGetHttpConnection ( String url ) throws
> NullPointerException, HttpException, IOException {
> HttpResponse result = null;
>
> System.out.println ( url );
>
> URI uri = new URI ( url.toCharArray () );
>
> String schema = uri.getScheme ();
> if ( ( schema == null ) || ( schema.equals ( "" ) ) ) {
> schema = "http";
> }
> Protocol protocol = Protocol.getProtocol ( schema );
>
> HttpState state = new HttpState ();
> state.setCookiePolicy ( CookiePolicy.RFC2109 );
> if ( cookies != null ) {
> for ( int idx = 0; idx < cookies.length; idx++ ) {
> Cookie cookie = cookies [ idx ];
> System.out.println ( "Cookie: " + cookie );
> state.addCookie ( cookie );
> }
> }
>
> String host = uri.getHost ();
> int port = uri.getPort ();
> GetMethod method = new GetMethod ( uri.toString () );
> method.setFollowRedirects ( true );
>
> HostConfiguration hostConfig = new HostConfiguration();
> if ( ( proxyHost != null ) && ( proxyPort != -1 ) ) {
> hostConfig.setProxy( proxyHost, proxyPort );
> }
>
> org.apache.commons.httpclient.HttpClient client = new
> org.apache.commons.httpclient.HttpClient ();
> client.setHostConfiguration( hostConfig );
> client.setState ( state );
> client.executeMethod( method );
>
> if ( method.getStatusCode() == HttpStatus.SC_OK ) {
> cookies = client.getState().getCookies ();
> FileWriter w = new FileWriter ("123.txt", true);
> w.write( method.getResponseBodyAsString () );
> w.close();
> result = new HttpResponse ( method.getResponseBodyAsString ()
> );
> } else {
> System.out.println ( "Unexpected failure: " +
> method.getStatusLine ().toString () );
> }
> method.releaseConnection ();
>
> return result;
> }
>
> }
>
>
>
>
--
,,,,
/'^'\
( o o )
--oOOO--(_)--OOOo------------------------------------------------
| Andrew A. Sabitov
| Email: [EMAIL PROTECTED]
| WWW: fir.catalysis.nsk.su/~sabitov
| .oooO Еж птица гордая - пока не пнешь, не полетит!
| ( ) Oooo.
---\ (----( )-------------------------------------------------
\_) ) /
(_/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]