Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
According to Gabriele Bartolini: At 13.45 22/10/2003 -0600, Neal Richter wrote: OK, you've convinced me, it IS useful to have this switch be user controlled.. I wasn't aware of the non-compliant servers causing an issue. Clearly 'automatic' behavior in that case is a bad thing. Go with option 2. Roger that. :-) I guess the only safe way to automate the selection of this would be for htdig to keep track, on a server by server basis, to see if a server responds favourably to HEAD requests. If it doesn't, then it would turn off this action for this server, but otherwise it seems it would almost always be an advantage to keep it on. But now we're getting into the area of feature requests, not bug fixes, so this should wait till after the upcoming release. If I'm not mistaken, as the code now stands, htdig will assume a document is inaccessible if the HEAD request fails, and so it won't try the GET on that document at all (unless head_before_get is explicitly set to false). So, properly automating this selection would require some code changes to the HtHTTP classs to implement this -- not something we want to start monkeying with at the eleventh hour before release. I think the current compromise is best, but it should be given a good pounding to make sure it's solid. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) --- This SF.net email is sponsored by: The SF.net Donation Program. Do you like what SourceForge.net is doing for the Open Source Community? Make a contribution, and help us add new features and functionality. Click here: http://sourceforge.net/donate/ ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
At 13.48 22/10/2003 -0600, Neal Richter wrote: Gabriele: Please create a sourceforge bug for this when you change it... and clue us all in on what the 'net change' is after the commits ;-). Sorry ... I forgot to open the bug before. Done everything. Ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia [EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447 Leave every hope, ye who enter!, Dante Alighieri, Divine Comedy, The Inferno --- This SF.net email is sponsored by: The SF.net Donation Program. Do you like what SourceForge.net is doing for the Open Source Community? Make a contribution, and help us add new features and functionality. Click here: http://sourceforge.net/donate/ ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
Greetings all, I've only been following this thready loosely, but my opinions are: 1. In version 3.2.1 (or 3.3, or 4.0) there should be three possible settings: true, false, auto. That way the user has complete control, but doesn't need to exert it. 2. We are in feature freeze, and scheduled to release in one week's time, at the end of October. We should minimise changes to the code. Has a bug report been filed for this issue yet? Wasn't the plan to have no CVS commits without reference to a bug number? Cheers, Lachlan On Wed, 22 Oct 2003 08:30, Gabriele Bartolini wrote: So ... we have 2 possibilities now: 1) leave the code as is 2) remove the overriding of the head before get in the incremental dig I must confess. I would prefer option 2, as I think users' must have full control of the tool and IMHO by adding a default behaviour of HEAD before GET to the system we've done our part. -- [EMAIL PROTECTED] ht://Dig developer DownUnder (http://www.htdig.org) --- This SF.net email is sponsored by OSDN developer relations Here's your chance to show off your extensive product knowledge We want to know what you know. Tell us and you have a chance to win $100 http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
Gabriele wrote: 1) leave the code as is 2) remove the overriding of the head before get in the incremental dig In both cases we need to write down a better documentation for this attribute (especially in the option 2 where we should talk about the benefits of a HEAD call in the incremental dig). I must confess. I would prefer option 2, as I think users' must have full control of the tool and IMHO by adding a default behaviour of HEAD before GET to the system we've done our part. OK, you've convinced me, it IS useful to have this switch be user controlled.. I wasn't aware of the non-compliant servers causing an issue. Clearly 'automatic' behavior in that case is a bad thing. Go with option 2. Thanks! Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 --- This SF.net email is sponsored by OSDN developer relations Here's your chance to show off your extensive product knowledge We want to know what you know. Tell us and you have a chance to win $100 http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
Lachlan wrote: 2. We are in feature freeze, and scheduled to release in one week's time, at the end of October. We should minimise changes to the code. Has a bug report been filed for this issue yet? Wasn't the plan to have no CVS commits without reference to a bug number? Gabriele: Please create a sourceforge bug for this when you change it... and clue us all in on what the 'net change' is after the commits ;-). As far as the release goes, we need to get some kind of testing report made and updated... I'll try and post something by tommorow. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 --- This SF.net email is sponsored by OSDN developer relations Here's your chance to show off your extensive product knowledge We want to know what you know. Tell us and you have a chance to win $100 http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
I read again my e-mail and I think that I should have written this sentence in another way: 2) performing HEAD calls only in the incremental dig (either with or without persistent connections) I meant: in the incremental dig perform just HEAD calls. I guess you guys understood: HEAD is performed only in incremental digs. If so ... I am sorry about that and my english. Ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia [EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447 Leave every hope, ye who enter!, Dante Alighieri, Divine Comedy, The Inferno --- This SF.net email is sponsored by OSDN developer relations Here's your chance to show off your extensive product knowledge We want to know what you know. Tell us and you have a chance to win $100 http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
At 16.01 21/10/2003 -0500, Gilles Detillieux wrote: 2) the server does not support HEAD (I have seen cases like this unfortunately) OK, that sounds pretty important. I hadn't heard that one before. I meant that some server administrators may turn off the HEAD method (in Apache you can use the Limit directive). but don't support the HEAD request. Wouldn't this be an argument against overriding head_before_get during an incremental dig? I guess it is a matter of choosing the less painful solution. In the normal case (p/c on and hbg on) overriding is not done; however, in the incremental dig, one more request is made (HEAD) without success and hopefully - after that - the document GETs retrieved. There is a bit of overhead for sure but the question is: is it better to have a bit of overhead in some cases (minority) or to prevent users from getting the benefit from using always a workin HEAD call when updating the database? The other way is to remove the override and leave everything in the hands of the user (I would not mind this - of course providing a better documentation). With the changes done yesterday we have moved towards a clearer situation anyway, because: - head before get is now true by default - head before get has been detached by persistent connections and has become independent 3) cases where the persistent communication between htdig and the server does not work at 100%: there can be some problems with persistent connections and HEAD calls (I experience this kind of problems sometimes with ht://Check and some NT servers) Again, is this going to be a problem if we don't allow turning off head_before_get during an update dig? I guess this could be fixable, because the problem comes up with persistent connections - which may be still disabled. with these questionably compliant servers, then wouldn't they need a way of turning off head_before_get unconditionally, whether it's an update dig or an initial one? Yes, that'd be great. Again, I guess we have to balance what we can do in order to make things easier to the user but, at the same time, leave the users enough freedom in order to configure their systems the way they want. Also, with 3.2, the server and URL blocks have added more dimensions to the space of configurability available to users and ... more clear attributes are available and more the toy gets perfect. This is what I was getting at before about this option never being explained adequately. You're right. On the surface, it seemed to be rather useless, but with these new revelations that have come out of your testing, it seems there may indeed be a need for turning this off in some cases. That's the sort of thing that should be documented so others (developers and end-users) know what you'd use this for. So ... we have 2 possibilities now: 1) leave the code as is 2) remove the overriding of the head before get in the incremental dig In both cases we need to write down a better documentation for this attribute (especially in the option 2 where we should talk about the benefits of a HEAD call in the incremental dig). I must confess. I would prefer option 2, as I think users' must have full control of the tool and IMHO by adding a default behaviour of HEAD before GET to the system we've done our part. So tell me what you think, especially you Gilles and Neal that have followed this thread. I am more than happy to (in case) rechange the code today. Ciao ciao -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia [EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447 Leave every hope, ye who enter!, Dante Alighieri, Divine Comedy, The Inferno --- This SF.net email is sponsored by OSDN developer relations Here's your chance to show off your extensive product knowledge We want to know what you know. Tell us and you have a chance to win $100 http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
According to Gabriele Bartolini: I think what we've had here is informative debate. You as much as anyone else wrote the networking code, so for me it's your decision. I think the new TRUE default is fine. OK. Any other opinions? I think it was just a matter of not understanding what the attribute did or didn't do, and in which circumstances it would be useful to change it. Because of the potential for serious performance degradation when you get it wrong, I think it would be helpful if the code automatically did the right thing in most circumstances, and if the documentation for this attribute made it clear in which circumstances it would make sense to turn it off. If you've perfected this logic in ht://Check, then we should probably consider syncing with your net code after 3.2 is done. So ... is it ok for you guys if I go on with the Retriever, Document and HtHTTP patch as suggested in the previous e-mails? I think that's what Neal was getting at when he said it's your decision. You wrote the networking code, so you know better than anyone else what's needed to make this particular change. It sounds reasonable to me that you'd need to make changes to these classes, as that's where the needed decisions must be made about the appropriate default action. Basically, in order to perform always a HEAD call during an incremental indexing, I need to store the information in both the Retriever and Document class. Is that right for you? In particular, I suggest this enum: enum RetrieverType { Retriever_Initial, Retriever_Incremental }; and then change the constructor this way: Retriever(RetrieverLog flags = Retriever_noLog, RetrieverType t = Retriever_Initial); In 'htdig.cc', we check whether the dig is an initial dig or not and: if(!initial) // Switch the retriever type to Incremental retriever_type = Retriever_Incremental; therefore, when we instantiate the main retriever object, we just simply add this: Retriever retriever(Retriever_logUrl, retriever_type); Please let me know. Well, it seems to me that there are actually two different cases where htdig does an initial dig. The obvious one is when the user specifies -i, which sets the initial flag. The less obvious one is when htdig is run without -i, but with no existing database, or with an empty one. What matters is whether there are URLs in the database or not. If there are none, then you'll never reject a document as not changed. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) --- This SF.net email is sponsored by OSDN developer relations Here's your chance to show off your extensive product knowledge We want to know what you know. Tell us and you have a chance to win $100 http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
I vote -1 for killing, if its function is described clearly and doesn't change, 0 otherwise, between any two succeeding versions of htdig. It's no harm in having this option, is there? If you don't want it, just turn it off. - Original Message - From: Gabriele Bartolini [EMAIL PROTECTED] I'm with you on this one.. we should just kill head_before_get. I would vote for killing it instead of hacking the logic. Hi guys, I hope that after my previous message you could change your mind. I vote -1 for killing this attribute. Ciao, -Gabriele --Jesse --- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
but maybe in future release we could use other HTTP headers (i.e. cookies, language, etc.) and a pre-emptive head could save time in a initial dig as well. Yep.. even on an initial dig HEAD is a good idea.. unless the website is almost all HTML pages with few images... which seems pretty pie-in-the-sky at this point. 2) I share the library with ht://Check which massively uses this option as it has to retrieve any document - images too - and a HEAD call could save a lot of time in the initial dig. I'd love to maintain the logic of the net library the more similar possible. Please let me know if the Retriever and Document classes changes make sense to you guys and I will modify the code. I think what we've had here is informative debate. You as much as anyone else wrote the networking code, so for me it's your decision. I think the new TRUE default is fine. If you've perfected this logic in ht://Check, then we should probably consider syncing with your net code after 3.2 is done. Thanks. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 --- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)
I'm with you on this one.. we should just kill head_before_get. I would vote for killing it instead of hacking the logic. Hi guys, I hope that after my previous message you could change your mind. I vote -1 for killing this attribute. Ciao, -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig IWA/HWG Member, ht://Check maintainer Current Location: Melbourne, Victoria, Australia [EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447 Leave every hope, ye who enter!, Dante Alighieri, Divine Comedy, The Inferno --- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php ___ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev