On Mon, 2004-11-01 at 20:37, Henri Yandell wrote: > Does HttpClient have anything to parse a robots.txt file?
Hi Henri, No, it does not. At the moment we are trying to keep HttpClient completely content-agnostic. This said, as soon as HttpClient 3.0 goes RC (or maybe even earlier) we'll embark on a long planned API redesign. One of the goals that we have in mind is to expand the scope of the project beyond the client-side, break monolithic HttpClient into smaller loosely coupled components and eventually make HttpClient evolve into a flexible toolset of HTTP components, which can be used to rapidly assemble HTTP agents, web crawlers, HTTP proxies, lightweight embedded HTTP servers. At that point a robots.txt parser would be a very welcome contribution > > If not, would anyone be interested in http://www.osjava.org/norbert/ ? > > I'd like to put it in the sandbox and thought that it would be of a > lot of interest to the HttpClient project and users. > Can we keep it in the sandbox for a while? As soon as HttpClient 4.0 API starts shaping up, the robot.txt parser could be migrated to Jakarta HttpClient to lay a foundation for a web crawler subcomponent. Folks, what do you think? Oleg --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
