Lars J. Nilsson wrote: > > I wouldn't think of Java as the first choice for a high-volume web spider. > > What are the advantages?
> - Java's build in network capabilities except the httpconnection implementations on most of the linux versions suck :( failure to timeout properly is one common bug. > - HTML parsing is a part of the core language and the HTML parsing is somewhat stricter than it might be, and needs to be expanded to do anything useful. > - Portability (write once debug... sorry, run everywhere) sort of true. perl/c can be easily made source portable. java can easily be made non-portable. > - Existing n-tier server architectures (JSP, Servlets, EJB, JDBC, JNDI and so on) > - Easy scalability (possibly through JavaSpaces and Jini) I'd agree with these, plus the Threading issues mentioned in another post. I still think java is a suitable language for building a spider particularly as most of the work goes into waiting for servers, and it is perfectly possible (if awkward) to use a seperate language for writing the parsers. Look here for some more info on the problems with using java: http://www.research.compaq.com/SRC/mercator/papers/Java99/final.html Richard -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".