On Mon, 2004-11-01 at 20:37, Henri Yandell wrote:
> Does HttpClient have anything to parse a robots.txt file?

Hi Henri,

No, it does not. At the moment we are trying to keep HttpClient
completely content-agnostic. This said, as soon as HttpClient 3.0 goes
RC (or maybe even earlier) we'll embark on a long planned API redesign.
One of the goals that we have in mind is to expand the scope of the
project beyond the client-side, break monolithic HttpClient into smaller
loosely coupled components and eventually make HttpClient evolve into a
flexible toolset of HTTP components, which can be used to rapidly
assemble HTTP agents, web crawlers, HTTP proxies, lightweight embedded
HTTP servers. At that point a robots.txt parser would be a very welcome
contribution

> 
> If not, would anyone be interested in http://www.osjava.org/norbert/ ?
> 
> I'd like to put it in the sandbox and thought that it would be of a
> lot of interest to the HttpClient project and users.
> 

Can we keep it in the sandbox for a while? As soon as HttpClient 4.0 API
starts shaping up, the robot.txt parser could be migrated to Jakarta
HttpClient to lay a foundation for a web crawler subcomponent.

Folks, what do you think?

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to