Henri Sivonen wrote:

That seems like a bad optimization. Adding an off-the-shelf HTML parser to a bot is much easier than tuning the general crawling functionality and task-specific functionality of a bot.

I suspect this will require far more of the bot than merely parsing HTML. Many login forms today effectively require human intelligence to process. After all it's not merely logging in that's at issue but registration. Frankly the current state of the art is one of the most broken and misdesigned aspects of HTML 4, and that's saying a lot. :-(

I'll have to consider the detailed proposal, but I tend to think that the solution lies in allowing forms to integrate better with HTTP authentication, not in eliminating HTTP authentication. A form action should be able to set the necessary HTTP headers. Also it should be possible for a form to easily tell the web browser to logout. And it would be nice to have something stronger than HTTP digest authentication for unencrypted channels, though I'd have to leave it to the experts to say if that's possible.

--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Refactoring HTML Just Published!
http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA

Reply via email to