Hello,
I forgot attached the program last time.Now it is coming.
And I found when I replace the <font face=.....> in html source file of <font font-family=....>,then it is ok.
 
Chao Liu 
Title: JDC Tech Tips: September 23, 1999
Java Technology Home Page
A-Z Index

Java Developer Connection(SM)
Technical Tips

Downloads, APIs, Documentation
Java Developer Connection
Tutorials, Tech Articles, Training
Online Support
Community Discussion
News & Events from Everywhere
Products from Everywhere
How Java Technology is Used Worldwide
Print Button

Members Only Requires login

Early Access Members Only

Downloads

Bug Database Members Only
Submit a Bug
View Database

Newsletters
Back Issues
Subscribe

Learning Centers
Articles
Bookshelf
Code Samples
New to Java
Question of the Week
Quizzes
Tech Tips
Tutorials

Forums

Technology Centers
SELECT - Consumer & Embedded - Enterprise - Wireless - more . . .
Tech Tips archive

Tech Tips

September 23, 1999

This issue presents tips, techniques, and sample code for the following topics:

This issue of the JDC Tech Tips is written by Patrick Chan,the author of the publication "The JavaTM Developers Almanac".


Extracting Links from an HTML File

There are many applications that fetch an HTML page from the Web and then extract the links from the page. For example, a link-checker application fetches a page, extracts the links, and then checks the links to see of they refer to actual pages.

The HTML 3.2 support in the JavaTM 2 platform makes it fairly easy to find and parse links. This tip demonstrates how to use that support.

The first step is to create an editor kit. The purpose of an editor kit is to parse data in some format, such as HTML or RTF, and store the information in a data structure that fully represents the data. This data structure, called a Document, allows you to examine and modify the data in a convenient way.

Let's look at an example. In the following example program, we're going to examine the HTML data in a Document object. The program looks for A (anchor) tags and extracts the HREF attribute information from these tags.

import java.io.*;
import java.net.*;
import javax.swing.text.*;
import javax.swing.text.html.*;

class GetLinks {
  public static void main(String[] args) {
    EditorKit kit = new HTMLEditorKit();
    Document doc = kit.createDefaultDocument();

    // The Document class does not yet 
    // handle charset's properly.
    doc.putProperty("IgnoreCharsetDirective", 
      Boolean.TRUE);
    try {

      // Create a reader on the HTML content.
      Reader rd = getReader(args[0]);

      // Parse the HTML.
      kit.read(rd, doc, 0);

      // Iterate through the elements 
      // of the HTML document.
      ElementIterator it = new ElementIterator(doc);
      javax.swing.text.Element elem;
      while ((elem = it.next()) != null) {
        SimpleAttributeSet s = (SimpleAttributeSet)
          elem.getAttributes().getAttribute(HTML.Tag.A);
        if (s != null) {
          System.out.println(
            s.getAttribute(HTML.Attribute.HREF));
        }
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
    System.exit(1);
  }

// Returns a reader on the HTML data. If 'uri' begins
// with "http:", it's treated as a URL; otherwise,
// it's assumed to be a local filename.
  static Reader getReader(String uri) 
    throws IOException {
    if (uri.startsWith("http:")) {

// Retrieve from Internet.
      URLConnection conn = 
        new URL(uri).openConnection();
      return new 
        InputStreamReader(conn.getInputStream());
    } else {

// Retrieve from file.
      return new FileReader(uri);
    }
  }
}
This program takes one parameter from the command line. If the parameter starts with "http:", the program treats the parameter as a URL and fetches the HTML from that URL. Otherwise, the parameter is treated as a filename and the HTML is fetched from that file.

For example,

$ java GetLinks http://java.sun.com

retrieves the HTML from the main page at java.sun.com.

The editor kit is an HTMLEditorKit object that contains an HTML parser. It creates a Document object that can represent HTML. And it's the editor kit's read() method that parses the HTML and stores the information in the Document.

Once the HTML data is saved in the Document object, we're ready to look for links. This is done by creating an iterator (using ElementIterator) that iterates over all the visible text pieces (called elements) in the HTML. For each text piece, we check to see if it has been formatted for linking, in other words, whether the text is formatted with the A (anchor) tag. We do this by calling getAttributes().getAttribute(HTML.Tag.A). If the text piece has been formatted with the A tag, the method call returns the set of attributes of the A tag used to format that text piece. Otherwise the method call simply returns null.

Note: The name getAttributes() is a little confusing because it has nothing to do with HTML attributes; the "attributes" in this case are all the HTML tags (such as an A tag) that were used to format that text piece.

Now we have the set of attributes of the A tag used to format a piece of text; it's stored in a SimpleAttributeSet object. So we just need to get the value of the HREF attribute and we're done. We can do this by calling getAttribute(HTML.Attribute.HREF) on the A tag's attribute set.


SORTING ARRAYS

This tip discusses how you can sort data in arrays. Sorting arrays of primitive types is easy. There are seven methods in the class Arrays for sorting arrays of each of the seven primitive types: byte, char, double, float, int, long, and short. Here's an example that sorts an array of doubles.

import java.util.*;
import java.awt.*;
 
class Sort1 {
  // Sorts an array of random double values.
  public static void main(String[] args) {
    double[] dblarr = new double[10];
    for (int i=0; i<dblarr.length; i++) {
      dblarr[i] = Math.random();
    }
        
    // Sort the array.
    Arrays.sort(dblarr);
    //Print the array
    for (int i=0; i<dblarr.length; i++){
      System.out.println(dblarr[i]);
    }
  }
}
Sorting an array of objects is just as easy if the objects implement the Comparable interface, java.util.Comparable. This interface gives a natural ordering for a class so that objects of that class can be sorted. Here's an example that sorts an array of type String that implements Comparable.
import java.util.*;
import java.awt.*;

class Sort2 {
  // Sorts the arguments in args.
  public static void main(String[] args) {
    Arrays.sort(args);
    //Print the arguments in args
    for (int i=0; i<args.length; i++){
      System.out.println(args[i]);
    }
  }
}

What if the objects do not implement Comparable? Well, you've got two choices: you can modify the objects to implement Comparable, or you can supply a Comparator to the sort method. Let's look at the first option first.

To make an object comparable you need to add Comparable to the object's implements list. You then need to modify the object to implement the compareTo() method. The compareTo() method compares the object with another object of the same type. If the object should appear before the other object, compareTo() should return a negative number. If the object should appear after the other object, compareTo() should return a non-zero positive number. Zero should be returned if the objects are equal.

Point is an AWT class that is not comparable. The following example creates a version of Point that is comparable. It sorts points by distance from the origin.

import java.util.*;
import java.awt.*;

class MyPoint extends java.awt.Point implements 
  Comparable {
  MyPoint(int x, int y) {
    super(x, y);
  }
  public int compareTo(Object o) {
    MyPoint p = (MyPoint)o;
    double d1 = Math.sqrt(x*x + y*y);
    double d2 = Math.sqrt(p.x*p.x + p.y*p.y);
    if (d1 < d2) {
      return -1;
    } else if (d2 < d1) {
      return 1;
    } 
    return 0;
  }
}
class Sort3 {
  public static void main(String[] args) {
    Random rnd = new Random();
    MyPoint[] points = new MyPoint[10];
    for (int i=0; i<points.length; i++) {
      points[i] = new MyPoint(rnd.nextInt(100), 
        rnd.nextInt(100));
    }
    Arrays.sort(points);
    //Print the points
    for (int i=0; i<points.length; i++){
      System.out.println(points[i]);
    }
  }
}
If you can't or don't want to make an object Comparable, you can supply a Comparator object to the Arrays.sort() method. The Comparator object must implement a method called compare(). The behaviour of the compare() method is almost identical to the compareTo() method of the Comparable interface.

The next example is similar to the one above. However, instead of creating a special kind of Point, we create a comparator that can sort Point objects.

import java.util.*;
import java.awt.*;

class PointComparator implements Comparator {
  public int compare(Object o1, Object o2) {
    Point p1 = (Point)o1;
    Point p2 = (Point)o2;
    double d1 = Math.sqrt(p1.x*p1.x + p1.y*p1.y);
    double d2 = Math.sqrt(p2.x*p2.x + p2.y*p2.y);
    if (d1 < d2) {
      return -1;
    } else if (d2 < d1) {
      return 1;
    } 
    return 0;
  }
    
}
class Sort4 {
  public static void main(String[] args) {
    Random rnd = new Random();
    Point[] points = new Point[10];
    for (int i=0; i<points.length; i++) {
      points[i] = new Point(rnd.nextInt(100), 
        rnd.nextInt(100));
    }
    Arrays.sort(points, new PointComparator());
    //Print the points
    for (int i=0; i<points.length; i++){
      System.out.println(points[i]);
    }
  }
}

! Note !

The names on the JDCSM mailing list are used for internal Sun MicrosystemsTM purposes only. To remove your name from the list, see Subscribe/Unsubscribe below.

! Feedback !

Comments? Send your feedback on the JDC Tech Tips to: jdc-webmaster

! Subscribe/Unsubscribe !

The JDC Tech Tips are sent to you because you elected to subscribe when you registered as a JDC member. To unsubscribe from JDC email, go to the following address and enter the email address you wish to remove from the mailing list:

http://developer.java.sun.com/unsubscribe.html

To become a JDC member and subscribe to this newsletter go to:

http://java.sun.com/jdc/


Print Button
[ This page was updated: 10-Apr-2001 ]
Products & APIs | Developer Connection | Docs & Training | Online Support
Community Discussion | Industry News | Solutions Marketplace | Case Studies
Glossary | Feedback | A-Z Index
For more information on Java technology
and other software from Sun Microsystems, call:
(800) 786-7638
Outside the U.S. and Canada, dial your country's AT&T Direct Access Number first.
Sun Microsystems, Inc.
Copyright © 1995-2001 Sun Microsystems, Inc.
All Rights Reserved. Terms of Use. Privacy Policy.

Reply via email to