Ciao Gabriele !

> this message. At this point, I think, we have to change the Retriever class
> cos, as it is now, it doesn't take a advantage of persistent connections. On
> every request, indeed, the server changes.

Yes, this is true. At the moment, the Server and Retriever classes are
buggy and in serious need of a tuneup. For another example, remember that
if you set server_wait_time, then you might expire a persistent
connection!

>         In this case I think we have to get the queue empty for every server
> that supports pcs. And, what about delay? Let me know about this.

I don't know. I think a delay on a persistent connection is silly, but
perhaps we allow a configuration to allow a certain number of documents
before a wait. The default can be -1, where all documents on a given
server are retrieved before going to another one.

To foul all this up, though we have to worry about hopcount! To ensure
accurate hopcount and allow restricting by hopcount, we must visit links
in a level-order manner. So the complicated loop might look something like
this:

while (Server = GetNextServer())
  if (persistent_connection)
        grab n documents with identical hopcount
  else
        grab next document with current hopcount

Here, GetNextServer would order servers based on hopcount first and delay
time next. I don't know if the explanation is very good, but I'd be glad
to flesh it out a bit. I also have a pile of notes on the Retriever class.
Since it didn't really get updated enough with the WordList cnages, it's
very buggy.

>         I am also very curious about Geoff's External Transport class.
> But what is it about?

Hopefully I can bolt myself to the keyboard tonight and finish it. It will
allow people to write scripts to access other protocols. So you might use
curl or wget or ncftp to support ftp:// or https:// protocols.

-Geoff


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to