Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

Gilles Detillieux Mon, 20 Oct 2003 15:04:50 -0700

According to Gabriele Bartolini:
> >   I think what we've had here is informative debate.  You as much as
> >anyone else wrote the networking code, so for me it's your decision.  I
> >think the new TRUE default is fine.
> 
> OK. Any other opinions?


I think it was just a matter of not understanding what the attribute did or
didn't do, and in which circumstances it would be useful to change it.
Because of the potential for serious performance degradation when you get it
wrong, I think it would be helpful if the code automatically did the right
thing in most circumstances, and if the documentation for this attribute
made it clear in which circumstances it would make sense to turn it off.

> >   If you've perfected this logic in ht://Check, then we should probably
> >consider syncing with your net code after 3.2 is done.
> 
> So ... is it ok for you guys if I go on with the Retriever, Document and 
> HtHTTP patch as suggested in the previous e-mails?

I think that's what Neal was getting at when he said it's your decision.
You wrote the networking code, so you know better than anyone else what's
needed to make this particular change.  It sounds reasonable to me that
you'd need to make changes to these classes, as that's where the needed
decisions must be made about the appropriate default action.

> Basically, in order to perform always a HEAD call during an incremental 
> indexing, I need to store the information in both the Retriever and 
> Document class. Is that right for you? In particular, I suggest this enum:
> 
>          enum  RetrieverType {
>                  Retriever_Initial,
>                  Retriever_Incremental
>          };
> 
> and then change the constructor this way:
> 
>          Retriever(RetrieverLog flags = Retriever_noLog, RetrieverType t = 
> Retriever_Initial);
> 
> In 'htdig.cc', we check whether the dig is an initial dig or not and:
> 
>          if(!initial) // Switch the retriever type to Incremental
>                  retriever_type = Retriever_Incremental;
> 
> therefore, when we instantiate the main retriever object, we just simply 
> add this:
> 
>          Retriever retriever(Retriever_logUrl, retriever_type);
> 
> Please let me know.

Well, it seems to me that there are actually two different cases where
htdig does an initial dig.  The obvious one is when the user specifies
-i, which sets the initial flag.  The less obvious one is when htdig is
run without -i, but with no existing database, or with an empty one.
What matters is whether there are URLs in the database or not.  If there
are none, then you'll never reject a document as "not changed".

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.net email is sponsored by OSDN developer relations
Here's your chance to show off your extensive product knowledge
We want to know what you know. Tell us and you have a chance to win $100
http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

Reply via email to