I am going to be using HTTPCLIENT to get the source of a web page and I am
hoping to be able to extract certain information from that webpage. It will
all be HTML and I am looking for all the information between these tags

//... HTML Stuff here
</td>

        <td class="alt1">(Simple 2 digit number I need here)</td>
        
        
</tr><tr align="center">
//... More HTML Stuff after this as well
</td>

        <td class="alt1">(Simple 2 digit number I need here)</td>
        
        
</tr><tr align="center">
//... HTML Stuff after this as well
Ect.

I am thinking I am going to have to search through the
method.getResponseBody() for text that begins with </td> <td class="alt1">
and ends in </tr><tr align="center"> and get the data in the middle of them.

Although am I right in thinking I can't search through a line at a time? I
have to wait till the entire source comes in and then search through a
massive string?

Anyway once I have the data I want it put into a text file for the sake of
it which I can do. 
Here's the code so far 

import java.io.*;
import java.net.*;
import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;

import java.io.*;

public class HttpClientTutorial {

  private static String url = "http://www.youngcoders.com/memberlist.php";;

  public static void main(String[] args) {
    // Create an instance of HttpClient.
    HttpClient client = new HttpClient();

    // Create a method instance.
    GetMethod method = new GetMethod(url);

    // Provide custom retry handler is necessary
    method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
                new DefaultHttpMethodRetryHandler(3, false));

    try {
      // Execute the method.
      int statusCode = client.executeMethod(method);

      if (statusCode != HttpStatus.SC_OK) {
        System.err.println("Method failed: " + method.getStatusLine());
      }

      // Read the response body.
      byte[] responseBody = method.getResponseBody();

      // Deal with the response.
      // Use caution: ensure correct character encoding and is not binary
data

      File outFile = new File("age.html");  // name  file

      BufferedWriter writer = new BufferedWriter(new FileWriter(outFile));

      String line = new String(responseBody);
          
          writer.write(line);
          writer.close();

      System.out.println(line);
      

    } catch (HttpException e) {
      System.err.println("Fatal protocol violation: " + e.getMessage());
      e.printStackTrace();
    } catch (IOException e) {
      System.err.println("Fatal transport error: " + e.getMessage());
      e.printStackTrace();
    } finally {
      // Release the connection.
      method.releaseConnection();
    }
  }
}

At the moment that just gets the entire web page and puts it in a .html file
but how do I just get certain bits from the page? 

Thanks for your time and if you don't understand anything just tell me and
Ill try and explain better.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to