On Wed, 15 Jun 2005, Roland Weber wrote:

> Hello Andrew,
> 
> 1. Use HttpMethod.getResponseBodyAsStream().
Yes, I do it now.

> 2. Are you sure that the question marks are actually in the
>    string? It could be that they appear only when you try to
>    *print* the string.

it is 0x3F ASCII code, so it's ?-sign.

> 
> 3. If the question marks really are in the string, the server
>    probably sent an inappropriate charset value, or none
>    at all. Anyway, it's better to do 1) and parse the HTML
>    code for a charset specification. You'll have to parse
>    it anyway in your robot.

Server sends Shift_JIS as page charset. 

it's my code now:

............
result = new HttpResponse ( method.getResponseBodyAsStream (), 
method.getResponseCharSet() );
.........

//in HttpResponse constructor:
HttpResponse ( InputStream responseBodyAsStream, String charset ) throws 
IOException {
        BufferedReader reader = new BufferedReader ( new InputStreamReader ( 
responseBodyAsStream, charset ) );
        String line = null;
        while ( ( line = reader.readLine() ) != null ) {
            this.add( line );
                        out.write( line );
                        out.write( "\n" );
        }

}

It works. :)

It's funny, but 
http://jakarta.apache.org/commons/httpclient/3.0/charencodings.html
says: "If the response is known to be a String, you can use the 
getResponseBodyAsString method which will automatically use the encoding 
specified in the Content-Type header or ISO-8859-1 if no charset is 
specified."

Content-Type for this page is "text/html; charset=Shift_JIS", I realy 
thought that httpclient autocovert body... :( 




> 
> hope that helps,
>   Roland
> 
> 
> 
> 
> 
> "Andrew A. Sabitov" <[EMAIL PROTECTED]> 
> 15.06.2005 06:20
> Please respond to
> "HttpClient User Discussion"
> 
> 
> To
> [email protected]
> cc
> 
> Subject
> Japanese charset?
> 
> 
> 
> 
> 
> 
> 
> Hi all!
> 
> Could anybody be so kind to help me? I should to make a robot, that will 
> fetch some data from amazon.co.jp. It will work under Linux. 
> 
> 
> This URL is a point of start for me:
> http://s1.amazon.co.jp/exec/varzea/subst/your-account/downloadable-reports.html
> 
> 
> There is a class code that downloads page below. The problem is that 
> method.getResponseBodyAsString() returns string, where all Japanese chars 
> replaced by question-mark. 
> 
> How can I fix this problem?
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> import java.io.FileWriter;
> import java.io.IOException;
> 
> import org.apache.commons.httpclient.Cookie;
> import org.apache.commons.httpclient.HostConfiguration;
> import org.apache.commons.httpclient.HttpConnection;
> import org.apache.commons.httpclient.HttpException;
> import org.apache.commons.httpclient.HttpState;
> import org.apache.commons.httpclient.HttpStatus;
> import org.apache.commons.httpclient.URI;
> import org.apache.commons.httpclient.protocol.Protocol;
> import org.apache.commons.httpclient.cookie.CookiePolicy;
> import org.apache.commons.httpclient.methods.GetMethod;
> 
> import ru.pp.sabitov.common.HttpResponse;
> 
> public class Client {
> 
>     private String         url        = null;
> 
>     private HttpConnection connection = null;
>     private Cookie[]       cookies    = null;
> 
>     private String         proxyHost  = null;
>     private int            proxyPort  = -1;
> 
>     public Client () {
> 
>     public void setProxy ( String host, String port ) {
> 
>     public void setProxy ( String host, int port ) {
> 
>     public HttpResponse openGetHttpConnection ( String url ) throws 
> NullPointerException, HttpException, IOException {
>         HttpResponse result = null;
> 
>         System.out.println ( url );
>  
>         URI uri = new URI ( url.toCharArray () );
> 
>         String schema = uri.getScheme ();
>         if ( ( schema == null ) || ( schema.equals ( "" ) ) ) {
>             schema = "http";
>         }
>         Protocol protocol = Protocol.getProtocol ( schema );
> 
>         HttpState state = new HttpState ();
>         state.setCookiePolicy ( CookiePolicy.RFC2109 );
>         if ( cookies != null ) {
>             for ( int idx = 0; idx < cookies.length; idx++ ) {
>                 Cookie cookie = cookies [ idx ];
>                 System.out.println ( "Cookie: " + cookie );
>                 state.addCookie ( cookie );
>             }
>         }
> 
>         String host = uri.getHost ();
>         int port = uri.getPort ();
>         GetMethod method = new GetMethod ( uri.toString () );
>         method.setFollowRedirects ( true );
>  
>         HostConfiguration hostConfig = new HostConfiguration();
>         if ( ( proxyHost != null ) && ( proxyPort != -1 ) ) {
>             hostConfig.setProxy( proxyHost, proxyPort );
>         }
> 
>         org.apache.commons.httpclient.HttpClient client = new 
> org.apache.commons.httpclient.HttpClient ();
>         client.setHostConfiguration( hostConfig );
>         client.setState ( state );
>         client.executeMethod( method );
> 
>         if ( method.getStatusCode() == HttpStatus.SC_OK ) {
>             cookies = client.getState().getCookies ();
>             FileWriter w = new FileWriter ("123.txt", true);
>             w.write( method.getResponseBodyAsString () );
>             w.close();
>             result = new HttpResponse ( method.getResponseBodyAsString () 
> );
>         } else {
>             System.out.println ( "Unexpected failure: " + 
> method.getStatusLine ().toString () );
>         }
>         method.releaseConnection ();
> 
>         return result;
>     }
> 
> }
> 
> 
> 
> 

-- 
       ,,,,
       /'^'\
      ( o o )
--oOOO--(_)--OOOo------------------------------------------------
|                  Andrew A. Sabitov
|                  Email: [EMAIL PROTECTED]
|                  WWW:   fir.catalysis.nsk.su/~sabitov
| .oooO   Еж птица гордая - пока не пнешь, не полетит!
| (   )   Oooo.
---\ (----(   )-------------------------------------------------
    \_)    ) /
          (_/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to