Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-25 Thread Gilles Detillieux
According to Gabriele Bartolini:
 At 13.45 22/10/2003 -0600, Neal Richter wrote:
OK, you've convinced me, it IS useful to have this switch be user
 controlled..  I wasn't aware of the non-compliant servers causing an
 issue.  Clearly 'automatic' behavior in that case is a bad thing.
 Go with option 2.
 
 Roger that. :-)

I guess the only safe way to automate the selection of this would be
for htdig to keep track, on a server by server basis, to see if a server
responds favourably to HEAD requests.  If it doesn't, then it would turn
off this action for this server, but otherwise it seems it would almost
always be an advantage to keep it on.  But now we're getting into the
area of feature requests, not bug fixes, so this should wait till after
the upcoming release.

If I'm not mistaken, as the code now stands, htdig will assume a document
is inaccessible if the HEAD request fails, and so it won't try the GET on
that document at all (unless head_before_get is explicitly set to false).
So, properly automating this selection would require some code changes
to the HtHTTP classs to implement this -- not something we want to start
monkeying with at the eleventh hour before release.

I think the current compromise is best, but it should be given a good
pounding to make sure it's solid.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


---
This SF.net email is sponsored by: The SF.net Donation Program.
Do you like what SourceForge.net is doing for the Open
Source Community?  Make a contribution, and help us add new
features and functionality. Click here: http://sourceforge.net/donate/
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-25 Thread Gabriele Bartolini
At 13.48 22/10/2003 -0600, Neal Richter wrote:
  Gabriele:  Please create a sourceforge bug for this when you change
it... and clue us all in on what the 'net change' is after the commits
;-).
Sorry ... I forgot to open the bug before. Done everything.

Ciao
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig  IWA/HWG Member, ht://Check 
maintainer
Current Location: Melbourne, Victoria, Australia
[EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 Leave every hope, ye who enter!, Dante Alighieri, Divine Comedy, The 
Inferno



---
This SF.net email is sponsored by: The SF.net Donation Program.
Do you like what SourceForge.net is doing for the Open
Source Community?  Make a contribution, and help us add new
features and functionality. Click here: http://sourceforge.net/donate/
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-22 Thread Lachlan Andrew
Greetings all,

I've only been following this thready loosely, but my opinions are:

1. In version 3.2.1 (or 3.3, or 4.0) there should be three possible 
settings:  true, false, auto.  That way the user has complete 
control, but doesn't need to exert it.

2. We are in feature freeze, and scheduled to release in one week's 
time, at the end of October.  We should minimise changes to the code.  
Has a bug report been filed for this issue yet?  Wasn't the plan to 
have no CVS commits without reference to a bug number?

Cheers,
Lachlan



On Wed, 22 Oct 2003 08:30, Gabriele Bartolini wrote:

 So ... we have 2 possibilities now:

 1) leave the code as is
 2) remove the overriding of the head before get in the incremental
 dig

 I must confess. I would prefer option 2, as I think users' must
 have full control of the tool and IMHO by adding a default
 behaviour of HEAD before GET to the system we've done our part.

-- 
[EMAIL PROTECTED]
ht://Dig developer DownUnder  (http://www.htdig.org)


---
This SF.net email is sponsored by OSDN developer relations
Here's your chance to show off your extensive product knowledge
We want to know what you know. Tell us and you have a chance to win $100
http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-22 Thread Neal Richter

Gabriele wrote:
 1) leave the code as is
 2) remove the overriding of the head before get in the incremental dig

 In both cases we need to write down a better documentation for this
 attribute (especially in the option 2 where we should talk about the
 benefits of a HEAD call in the incremental dig).

 I must confess. I would prefer option 2, as I think users' must have full
 control of the tool and IMHO by adding a default behaviour of HEAD before
 GET to the system we've done our part.

  OK, you've convinced me, it IS useful to have this switch be user
controlled..  I wasn't aware of the non-compliant servers causing an
issue.  Clearly 'automatic' behavior in that case is a bad thing.
Go with option 2.

  Thanks!

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




---
This SF.net email is sponsored by OSDN developer relations
Here's your chance to show off your extensive product knowledge
We want to know what you know. Tell us and you have a chance to win $100
http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-22 Thread Neal Richter

Lachlan wrote:
 2. We are in feature freeze, and scheduled to release in one week's
 time, at the end of October.  We should minimise changes to the code.
 Has a bug report been filed for this issue yet?  Wasn't the plan to
 have no CVS commits without reference to a bug number?

  Gabriele:  Please create a sourceforge bug for this when you change
it... and clue us all in on what the 'net change' is after the commits
;-).

  As far as the release goes, we need to get some kind of testing report
made and updated... I'll try and post something by tommorow.

  Thanks.

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




---
This SF.net email is sponsored by OSDN developer relations
Here's your chance to show off your extensive product knowledge
We want to know what you know. Tell us and you have a chance to win $100
http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-21 Thread Gabriele Bartolini
I read again my e-mail and I think that I should have written this sentence 
in another way:

2) performing HEAD calls only in the incremental dig (either with or 
without persistent connections)
I meant: in the incremental dig perform just HEAD calls. I guess you guys 
understood: HEAD is performed only in incremental digs.

If so ... I am sorry about that and my english.

Ciao
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig  IWA/HWG Member, ht://Check 
maintainer
Current Location: Melbourne, Victoria, Australia
[EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 Leave every hope, ye who enter!, Dante Alighieri, Divine Comedy, The 
Inferno



---
This SF.net email is sponsored by OSDN developer relations
Here's your chance to show off your extensive product knowledge
We want to know what you know. Tell us and you have a chance to win $100
http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-21 Thread Gabriele Bartolini
At 16.01 21/10/2003 -0500, Gilles Detillieux wrote:
 2) the server does not support HEAD (I have seen cases like this 
unfortunately)
OK, that sounds pretty important.  I hadn't heard that one before.
I meant that some server administrators may turn off the HEAD method (in 
Apache you can use the Limit directive).

but don't support the HEAD request.  Wouldn't this be an argument against
overriding head_before_get during an incremental dig?
I guess it is a matter of choosing the less painful solution. In the normal 
case (p/c on and hbg on) overriding is not done; however, in the 
incremental dig, one more request is made (HEAD) without success and 
hopefully - after that - the document GETs retrieved. There is a bit of 
overhead for sure but the question is: is it better to have a bit of 
overhead in some cases (minority) or to prevent users from getting the 
benefit from using always a workin HEAD call when updating the database?

The other way is to remove the override and leave everything in the hands 
of the user (I would not mind this - of course providing a better 
documentation).

With the changes done yesterday we have moved towards a clearer situation 
anyway, because:
- head before get is now true by default
- head before get has been detached by persistent connections and has 
become independent

 3) cases where the persistent communication between htdig and the server
 does not work at 100%: there can be some problems with persistent
 connections and HEAD calls (I experience this kind of problems sometimes
 with ht://Check and some NT servers)
Again, is this going to be a problem if we don't allow turning off
head_before_get during an update dig?
I guess this could be fixable, because the problem comes up with persistent 
connections - which may be still disabled.

with these questionably compliant servers, then wouldn't they need a way
of turning off head_before_get unconditionally, whether it's an update
dig or an initial one?
Yes, that'd be great.

Again, I guess we have to balance what we can do in order to make things 
easier to the user but, at the same time, leave the users enough freedom in 
order to configure their systems the way they want. Also, with 3.2, the 
server and URL blocks have added more dimensions to the space of 
configurability available to users and ... more clear attributes are 
available and more the toy gets perfect.

This is what I was getting at before about this option never being
explained adequately.
You're right.

  On the surface, it seemed to be rather useless,
but with these new revelations that have come out of your testing, it
seems there may indeed be a need for turning this off in some cases.
That's the sort of thing that should be documented so others (developers
and end-users) know what you'd use this for.
So ... we have 2 possibilities now:

1) leave the code as is
2) remove the overriding of the head before get in the incremental dig
In both cases we need to write down a better documentation for this 
attribute (especially in the option 2 where we should talk about the 
benefits of a HEAD call in the incremental dig).

I must confess. I would prefer option 2, as I think users' must have full 
control of the tool and IMHO by adding a default behaviour of HEAD before 
GET to the system we've done our part.

So tell me what you think, especially you Gilles and Neal that have 
followed this thread. I am more than happy to (in case) rechange the code 
today.

Ciao ciao
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig  IWA/HWG Member, ht://Check 
maintainer
Current Location: Melbourne, Victoria, Australia
[EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 Leave every hope, ye who enter!, Dante Alighieri, Divine Comedy, The 
Inferno



---
This SF.net email is sponsored by OSDN developer relations
Here's your chance to show off your extensive product knowledge
We want to know what you know. Tell us and you have a chance to win $100
http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-20 Thread Gilles Detillieux
According to Gabriele Bartolini:
I think what we've had here is informative debate.  You as much as
 anyone else wrote the networking code, so for me it's your decision.  I
 think the new TRUE default is fine.
 
 OK. Any other opinions?

I think it was just a matter of not understanding what the attribute did or
didn't do, and in which circumstances it would be useful to change it.
Because of the potential for serious performance degradation when you get it
wrong, I think it would be helpful if the code automatically did the right
thing in most circumstances, and if the documentation for this attribute
made it clear in which circumstances it would make sense to turn it off.

If you've perfected this logic in ht://Check, then we should probably
 consider syncing with your net code after 3.2 is done.
 
 So ... is it ok for you guys if I go on with the Retriever, Document and 
 HtHTTP patch as suggested in the previous e-mails?

I think that's what Neal was getting at when he said it's your decision.
You wrote the networking code, so you know better than anyone else what's
needed to make this particular change.  It sounds reasonable to me that
you'd need to make changes to these classes, as that's where the needed
decisions must be made about the appropriate default action.

 Basically, in order to perform always a HEAD call during an incremental 
 indexing, I need to store the information in both the Retriever and 
 Document class. Is that right for you? In particular, I suggest this enum:
 
  enum  RetrieverType {
  Retriever_Initial,
  Retriever_Incremental
  };
 
 and then change the constructor this way:
 
  Retriever(RetrieverLog flags = Retriever_noLog, RetrieverType t = 
 Retriever_Initial);
 
 In 'htdig.cc', we check whether the dig is an initial dig or not and:
 
  if(!initial) // Switch the retriever type to Incremental
  retriever_type = Retriever_Incremental;
 
 therefore, when we instantiate the main retriever object, we just simply 
 add this:
 
  Retriever retriever(Retriever_logUrl, retriever_type);
 
 Please let me know.

Well, it seems to me that there are actually two different cases where
htdig does an initial dig.  The obvious one is when the user specifies
-i, which sets the initial flag.  The less obvious one is when htdig is
run without -i, but with no existing database, or with an empty one.
What matters is whether there are URLs in the database or not.  If there
are none, then you'll never reject a document as not changed.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


---
This SF.net email is sponsored by OSDN developer relations
Here's your chance to show off your extensive product knowledge
We want to know what you know. Tell us and you have a chance to win $100
http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-15 Thread Jesse op den Brouw
I vote -1 for killing, if its function is described clearly and doesn't
change, 0
otherwise, between any two succeeding versions of htdig.
It's no harm in having this option, is there?
If you don't want it, just turn it off.

- Original Message - 
From: Gabriele Bartolini [EMAIL PROTECTED]

I'm with you on this one.. we should just kill head_before_get.  I
would
 vote for killing it instead of hacking the logic.

 Hi guys, I hope that after my previous message you could change your mind.
 I vote -1 for killing this attribute.

 Ciao,
 -Gabriele

--Jesse



---
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-15 Thread Neal Richter

 but maybe in future release we could use other HTTP headers (i.e. cookies,
 language, etc.) and a pre-emptive head could save time in a initial dig as
 well.

  Yep.. even on an initial dig HEAD is a good idea.. unless the website is
almost all HTML pages with few images... which seems pretty pie-in-the-sky
at this point.

 2) I share the library with ht://Check which massively uses this option as
 it has to retrieve any document - images too - and a HEAD call could save a
 lot of time in the initial dig. I'd love to maintain the logic of the net
 library the more similar possible.

 Please let me know if the Retriever and Document classes changes make sense
 to you guys and I will modify the code.

  I think what we've had here is informative debate.  You as much as
anyone else wrote the networking code, so for me it's your decision.  I
think the new TRUE default is fine.

  If you've perfected this logic in ht://Check, then we should probably
consider syncing with your net code after 3.2 is done.

Thanks.

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




---
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-14 Thread Gabriele Bartolini

  I'm with you on this one.. we should just kill head_before_get.  I would
vote for killing it instead of hacking the logic.
Hi guys, I hope that after my previous message you could change your mind. 
I vote -1 for killing this attribute.

Ciao,
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig  IWA/HWG Member, ht://Check 
maintainer
Current Location: Melbourne, Victoria, Australia
[EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 Leave every hope, ye who enter!, Dante Alighieri, Divine Comedy, The 
Inferno



---
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
___
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev