I'm not an expert on Java (yet) :)   but I was wondering if there was a way
to access the Internet Explorer Interface and/or the HTML Document Object
Model.  With this and as far as the links are concerned, just access the
"Links" node of the Document Object.  All the links are already formatted.
No parsing is required. Also with the Document Object you have access to all
elements on a page.  ... at least this is the way it is in Visual Basic
;-)    

Jim MacDiarmid
Senior Software Engineer - Pacel Corp.
Manassas, Virginia... USA
www.pacelcorp.com




> -----Original Message-----
> From: Tim Bray [SMTP:[EMAIL PROTECTED]]
> Sent: Sunday, June 10, 2001 1:41 PM
> To:   [EMAIL PROTECTED]
> Subject:      [Robots] Re: Robots.txt  (was: Hello)
> 
> 
> At 02:53 PM 09/06/01 -0700, Ed Bockelman wrote:
> >> I am java programmer, presently making a web spider
> >> program for a search engine..
> >
> >I wouldn't think of Java as the first choice for a high-volume web
> spider.  What are the advantages?
> 
> It's a nice programming language, and has a pretty good
> net interface library.  The only downside is that a spider 
> spends a huge amount of its time picking apart page content
> looking for links and so on, and has to deal with all the
> badly broken HTML out there.  This is probably easier in
> perl or python.  But then a spider has to be massively
> parallel and java's threading is massively better than
> perl's. -T
> 
> 
> --
> This message was sent by the Internet robots and spiders discussion list
> ([EMAIL PROTECTED]).  For list server commands, send "help" in the body
> of a message to "[EMAIL PROTECTED]".

--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a message 
to "[EMAIL PROTECTED]".

Reply via email to