Hello Andrew,

1. Use HttpMethod.getResponseBodyAsStream().

2. Are you sure that the question marks are actually in the
   string? It could be that they appear only when you try to
   *print* the string.

3. If the question marks really are in the string, the server
   probably sent an inappropriate charset value, or none
   at all. Anyway, it's better to do 1) and parse the HTML
   code for a charset specification. You'll have to parse
   it anyway in your robot.

hope that helps,
  Roland





"Andrew A. Sabitov" <[EMAIL PROTECTED]> 
15.06.2005 06:20
Please respond to
"HttpClient User Discussion"


To
[email protected]
cc

Subject
Japanese charset?







Hi all!

Could anybody be so kind to help me? I should to make a robot, that will 
fetch some data from amazon.co.jp. It will work under Linux. 


This URL is a point of start for me:
http://s1.amazon.co.jp/exec/varzea/subst/your-account/downloadable-reports.html


There is a class code that downloads page below. The problem is that 
method.getResponseBodyAsString() returns string, where all Japanese chars 
replaced by question-mark. 

How can I fix this problem?

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

import java.io.FileWriter;
import java.io.IOException;

import org.apache.commons.httpclient.Cookie;
import org.apache.commons.httpclient.HostConfiguration;
import org.apache.commons.httpclient.HttpConnection;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpState;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.URI;
import org.apache.commons.httpclient.protocol.Protocol;
import org.apache.commons.httpclient.cookie.CookiePolicy;
import org.apache.commons.httpclient.methods.GetMethod;

import ru.pp.sabitov.common.HttpResponse;

public class Client {

    private String         url        = null;

    private HttpConnection connection = null;
    private Cookie[]       cookies    = null;

    private String         proxyHost  = null;
    private int            proxyPort  = -1;

    public Client () {

    public void setProxy ( String host, String port ) {

    public void setProxy ( String host, int port ) {

    public HttpResponse openGetHttpConnection ( String url ) throws 
NullPointerException, HttpException, IOException {
        HttpResponse result = null;

        System.out.println ( url );
 
        URI uri = new URI ( url.toCharArray () );

        String schema = uri.getScheme ();
        if ( ( schema == null ) || ( schema.equals ( "" ) ) ) {
            schema = "http";
        }
        Protocol protocol = Protocol.getProtocol ( schema );

        HttpState state = new HttpState ();
        state.setCookiePolicy ( CookiePolicy.RFC2109 );
        if ( cookies != null ) {
            for ( int idx = 0; idx < cookies.length; idx++ ) {
                Cookie cookie = cookies [ idx ];
                System.out.println ( "Cookie: " + cookie );
                state.addCookie ( cookie );
            }
        }

        String host = uri.getHost ();
        int port = uri.getPort ();
        GetMethod method = new GetMethod ( uri.toString () );
        method.setFollowRedirects ( true );
 
        HostConfiguration hostConfig = new HostConfiguration();
        if ( ( proxyHost != null ) && ( proxyPort != -1 ) ) {
            hostConfig.setProxy( proxyHost, proxyPort );
        }

        org.apache.commons.httpclient.HttpClient client = new 
org.apache.commons.httpclient.HttpClient ();
        client.setHostConfiguration( hostConfig );
        client.setState ( state );
        client.executeMethod( method );

        if ( method.getStatusCode() == HttpStatus.SC_OK ) {
            cookies = client.getState().getCookies ();
            FileWriter w = new FileWriter ("123.txt", true);
            w.write( method.getResponseBodyAsString () );
            w.close();
            result = new HttpResponse ( method.getResponseBodyAsString () 
);
        } else {
            System.out.println ( "Unexpected failure: " + 
method.getStatusLine ().toString () );
        }
        method.releaseConnection ();

        return result;
    }

}



-- 
       ,,,,
       /'^'\
      ( o o )
--oOOO--(_)--OOOo------------------------------------------------
|                  Andrew A. Sabitov
|                  Email: [EMAIL PROTECTED]
|                  WWW:   fir.catalysis.nsk.su/~sabitov
| .oooO   Еж птица гордая - пока не пнешь, не полетит!
| (   )   Oooo.
---\ (----(   )-------------------------------------------------
    \_)    ) /
          (_/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to