Hello,

 

I’ve recently integrated HTTPClient 3.0-rc2 into an application that uses a popular Proxy service, Anonymizer.com. Unfortunetly, sometimes HTTPClient returns webpages with portions of the page corrupted. If I use any web browser (IE, Firefox, Opera) I never see the same corrupt data in the same webpages. I was originally using simple sockets and java.net to find, connect to and retrieve these pages, but when switching to Anonymizer I was running into problems parsing chunked data content. I had written a regex _expression_ to try and find and discard the chunk identifiers (as opposed to reading the page based on the chunked identifers) but the regex _expression_ would occasionally miss some of the hex idenifiers. I cannot find anything wrong with the regex _expression_ and so I suspect that the proxy was not returning data per RFC. Regardless, I decided to switch over to HTTPClient for several reasons, one of which is the transparent reading of chunked data. Still, after implementing (and I hope I followed the tutorials, docs, and sample code as closely as possible), I’m still getting corrupt data. I’ve looked throughout the user and most of the dev mailing lists and have not found quite a similar problem being reported.

 

So my few questions, if any one can help, are:

 

1)       Should HTTPClient 3.0 return data as well as any web browser?

2)       Has anyone run into similar problems with Proxy Services?

3)       Are there any fine tuning tips anyone has for using Proxies?

4)       Or tips for reading chunked data?

 

Below is a snip of the code to connect to and retrieve proxy data. Note, I did not follow the sample Proxy code found in the rc2 src, because I need to sometimes connect to Google and Overture, both of which return 502 – Forbidden pages when connecting using that particular method. Instead I opted on the tuturial method of connecting through proxies.

 

Also, attached is one of the returned corrupted pages. Check out the page source and about 70 lines down, you’ll start seeing the corrupted characters.

 

Thanks in advance for any pointers or responses

 

Chris

 

    private HttpConnectionEngine(String pHost, int pPort,

            HttpConnectionEngineParams pConnEngineParams) {

        HttpConnectionManagerParams connManagerParams = new HttpConnectionManagerParams();

        connManagerParams.setDefaultMaxConnectionsPerHost(pConnEngineParams

                .getMaxConnectionsPerHost());

        connManagerParams.setMaxTotalConnections(pConnEngineParams

                .getMaxConnectionsPerHost());

        connManagerParams.setStaleCheckingEnabled(pConnEngineParams

                .isConnectionStaleCheckingEnabled());

        connManagerParams.setConnectionTimeout((int) pConnEngineParams

                .getIdleConnectionTimeout());

 

        cConnManager = new MultiThreadedHttpConnectionManager();

        cConnManager.setParams(connManagerParams);

        cConnManager.closeIdleConnections(pConnEngineParams

                .getIdleConnectionTimeout());

        cConnEngineParams = pConnEngineParams;

        cHostConfig = new HostConfiguration();

        cHostConfig.setHost("www.google.com"); // example host that sometimes returns corrupt webpage

        cHostConfig.setProxy("quinstreet.anonymizer.com", 80); // proxy host

    }

 

    public void readFromServer(String pRequest, StringBuffer pWebPage)

            throws InvalidArgumentException {

 

        final String METHOD = "readFromServer()";

 

        int status = -1, lReadLine = -1;

        int lBufSize = 4 * 1024;

        char[] lHtmlBuf = new char[lBufSize];

        GetMethod lRequestMethod = null;

        InputStreamReader lIn = null;

        HttpClient lClient = new HttpClient();

        lClient.setHttpConnectionManager(cConnManager);

        lClient.setHostConfiguration(cHostConfig);

        // set number of retrys on bad connect

        lClient.getParams().setParameter(

                HttpMethodParams.RETRY_HANDLER,

                new DefaultHttpMethodRetryHandler(cConnEngineParams

                        .getNumOfRetryOnBadHttpStatus(), cConnEngineParams

                        .isRequestSentRetryEnabled()));

        log.log(Level.INFO, "Request: " + pRequest);

        lRequestMethod = new GetMethod(pRequest);

 

        // add headers to request

        Properties lReqHeaderProps = cConnEngineParams.getReqHeaderProps();

        Enumeration enum = lReqHeaderProps.keys();

        while (enum.hasMoreElements()) {

            String key = (String) enum.nextElement();

            lRequestMethod.addRequestHeader(key, lReqHeaderProps

                    .getProperty(key));

        }

        // clean StringBuffer

        pWebPage.delete(0, pWebPage.length());

        // execute request

        try {

            status = lClient.executeMethod(lRequestMethod);           

            lIn = new InputStreamReader(

                    lRequestMethod.getResponseBodyAsStream(),

                    lRequestMethod.getResponseCharSet()

            );

            while ((lReadLine = lIn.read(lHtmlBuf)) != -1) {

                pWebPage.append(lHtmlBuf);

                lHtmlBuf = new char[lBufSize];

            }

        } catch (HttpException he) {

            throw new InvalidArgumentException(

                    "HttpException executing GetMethod on request: " + pRequest

                            + ", with: " + he.getMessage());

        } catch (IOException ioe) {

            throw new InvalidArgumentException(

                    "IOException executing request or reading response on request: "

                            + pRequest + ", with: " + ioe.getMessage());

        } finally {

            // clean resources

            // NOTE: don't close connection with HTTP/1.1

            lRequestMethod.releaseConnection();

            lRequestMethod = null;

            lClient = null;

            enum = null;

            // check the status for logging

            if (status != HttpStatus.SC_OK)

                log.logp(Level.INFO, CLASS, METHOD, "Bad request, status: "

                        + status);

            else

                log.logp(Level.FINE, CLASS, METHOD,

                        "OK status, webpage-length:" + pWebPage.length());

        }

    }

 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to