Re: [htdig-dev] Licenses..

2004-07-22 Thread Gilles Detillieux
According to Robert Ribnitz: I am in the process of making my 'htdig' package (the one based on 3.1.6) lintian-clean (Lintian is a special program that checks for conformance of the package to the debian packaging guidelines). I found the following licenses (which I need to list in

Re: [htdig-dev] new rundig script (was: Various things..)

2004-07-22 Thread Gilles Detillieux
Yes, in both 3.1.6 and 3.2.0b6, and indeed as far back as I can recall, the standard rundig script bundled with ht://Dig (installed from installdir) has always passed -i to htdig by default. I have no problems with Robert's modified script, but be well aware that bundling this one will mean that

Re: [htdig-dev] rundig again..

2004-07-22 Thread Gilles Detillieux
According to Robert Ribnitz: I know I am commenting on old code, but please have a look at: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=117887 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139922 both revolve around the grep in rundig (for 3.1.6) I have added the ^ in the

Re: [htdig-dev] Status of htdig for Sarge..

2004-07-14 Thread Gilles Detillieux
According to Robert Ribnitz: Hello Gabriele, Htdig does not depend on an external libdb3 /libdb2 (since it uses its own version of it, already discussed). I was thinking about externalising libhtdig (and libhtdigphp). Forcibly, those libs would be different for the two htdig versions,

Re: [htdig-dev] Htdig 3.1.6 for inclusion in Sarge...

2004-07-14 Thread Gilles Detillieux
According to Robert Ribnitz: the version of htdig 3.1.6 to include in Sarge will be the version previously in Debian, plus all all patches for 3.1.6 with a later date than the version on the website. Does it make sense to not patch up? It certainly makes sense to include all bug fix patches

Re: [htdig-dev] Contrib Guides - update

2004-07-14 Thread Gilles Detillieux
According to Tony Howden: For the ht://dig webmaster, the contribs page for Guides has a broken link at the top. Search This! Searching Your Dynamic Site Using PHP3 and ht://Dig http://www.devshed.com/Server_Side/PHP/Search_This/page1.html by Colin Viebrock This link goes to a default

Re: [htdig-dev] Htdig-3.2.0b6 debian package..

2004-07-06 Thread Gilles Detillieux
Well, I suspect there's a judgement call involved in whether bug 244867 is fixed or not. If any performance slower than 3.1.6 is going to be viewed as unacceptable, then no, it's not fixed yet. However, 3.2.0b6 does fix the biggest source of bad performance in 3.2.0b5, i.e. the repeated calls to

Re: [htdig-dev] maindocs: SSI out, CSS in? (was: news.txt file gone)

2004-06-18 Thread Gilles Detillieux
All right, after about 4 months and a few news updates, I'm satisfied that my script for regenerating main.html is working satisfactorily, so I've switched the index.html and contents.html files over to using this, rather than the main.shtml file. This should help out our mirrors who don't

Re: [htdig-dev] Obsolete documentation?

2004-06-18 Thread Gilles Detillieux
According to Lachlan Andrew: The htmerge.html documentation currently says Note: You must run htmerge separately on each of the databases created by htdig before merging them together with this option. This is because merging the two wordlists together requires wordlists

Re: [htdig-dev] Gilles' patch for configuration

2004-06-10 Thread Gilles Detillieux
According to Gabriele Bartolini: this patch to go into b6, but it would be a good idea if a few other developers tested it first, as I asked back in April. All I can say is it works for me (but I tested it using flex 2.5.4a, and only on RH 9). You are right. I did not have a chance to try

Re: [htdig-dev] Re: Accents, endings and chaining

2004-06-09 Thread Gilles Detillieux
According to Lachlan Andrew: Even though Gilles has solved Dominique's problem, the issue of chaining fuzzy rules remains. One possible solution would be to make the user (administrator) specify explicit chainings. For example match_method: accents:0.9 endings:0.9

Re: [htdig-dev] Accents, endings and chaining

2004-06-08 Thread Gilles Detillieux
a problem, as for acouphène in this case. On Thu, 3 Jun 2004 02:07 am, Gilles Detillieux wrote: Dominique had written: 3- I have a problem with the accent and plurials. If a search for herbe or herbes, no problems. But, if the works have an accent, like acouphène, htdig have

Re: [htdig-dev] Gilles' patch for configuration

2004-06-08 Thread Gilles Detillieux
According to Lachlan Andrew: Greetigns Gilles, Are you sure that Neal committed the last *.cxx builds? There's hardly a sign of him when browsing CVS. In particular, the #ifdef _WIN32 lines in conf_lexer.cxx appeared on July 21 when Gabriele applied the patch by Marco Nenciarini.

Re: [htdig-dev] Re: Accents, endings and chaining

2004-06-08 Thread Gilles Detillieux
According to Neal Richter: Here are two possible approaches: 1) Strip accents from all stored words queries. This is a fairly common practice in search engines NLP systems. The obvious dissadvantage is that a user can't restrict results to contain that specific accent... they get back

Re: [htdig-dev] Re: Accents, endings and chaining

2004-06-08 Thread Gilles Detillieux
According to Dominique Arpin: I will install htdig 3.2 beta6 and I will try this patch. Your french.aff file only defines altstringchar entries for tex, not for latin1, so you shouldn't need the patch I mentioned. As far as I can tell, the patch won't make any difference for your affix file.

Re: [htdig-dev] bug or config problem

2004-06-03 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: Perhaps my email was ambiguous. What I was planning to do is make the default value of no_next_page_text $(next_page_text), instead of next. That still gives people the option to make them different if they want to, but means that customising them both to be

Re: [htdig-dev] Re: Htdig 3.2.0 in Debian Sarge?

2004-04-26 Thread Gilles Detillieux
According to Lachlan Andrew: How does the following sound: 1. Target 3.2.0b6 for the end of May 2. This will be basically a bug-fix / optimisation from 3.2.0b5 3. It will include the following bug fixes from Joe's archive - config_parser.1 My one concern about config_parser.1 is that the

Re: [htdig-dev] Not changed on local files

2004-04-26 Thread Gilles Detillieux
According to Wim Kosten: I've got this weird problem with ht://DIG 3.1.6 I use htdig to index about 8000 PDF documents. In order to get the correct ordering on dates (PDF files and normal HTML files) these files are touched to a date as set in the database. So a document which concerns a

Re: [htdig-dev] PATCH: 3.2.0b5 config parser fixes

2004-04-26 Thread Gilles Detillieux
According to Joe R. Jah: On Fri, 23 Apr 2004, Gilles Detillieux wrote: Below is an updated patch, which should apply properly to 3.2.0b5. Thank you Guilles; this patch applies and compiles like a charm;) Hey, that's a new spelling! Most people just drop an l. ;-) I see that except one

Re: [htdig-dev] Re: Htdig 3.2.0 in Debian Sarge?

2004-04-26 Thread Gilles Detillieux
OK, I took a closer look at Dictionary.cc. I had been confusing your new word_entry objects with the DictionaryEntry ones. It seems the DictionaryEntry class handles the release() call, and shouldn't care whether the object it points to has already been deleted or not. So, I think the way

Re: [htdig-dev] PATCH: 3.2.0b5 config parser fixes

2004-04-23 Thread Gilles Detillieux
According to Joe R. Jah: patch -p0 ../patch/config_parser.0 patching file `htcommon/conf_lexer.lxx' patching file `htcommon/conf_parser.yxx' Hunk #4 FAILED at 254. 1 out of 4 hunks FAILED -- saving rejects to htcommon/conf_parser.yxx.rej That hunk failed because I built the patch against

[htdig-dev] PATCH: 3.2.0b5 config parser fixes

2004-04-22 Thread Gilles Detillieux
Hi, folks. I was going to take another crack at the performance optimizations in HtRegexList, but before that, I got down to tackling some of the config parser bugs that had been nagging at us for quite some time. Please have a look at this patch, and test it as much as possible, to see if it

Re: [htdig-dev] Solved! - Re: Performance issue with exclude_urls

2004-04-22 Thread Gilles Detillieux
According to Christopher Murtagh: On Wed, 2004-04-21 at 23:21, Gilles Detillieux wrote: 3) We may also need to determine if the repeated calls to config-Find() at each URL are having an impact on performance as well. E.g. what is the performance cost of doing thousands of calls like

Re: [htdig-dev] PATCH: Performance issue with exclude_urls

2004-04-22 Thread Gilles Detillieux
According to me: According to Christopher Murtagh: Then I re-compiled and ran with my normal excludes URL list. It didn't seem to have much of an impact on performance. This means that the performance hit is definitely in the HtRegexList::setEscaped method. Thanks, Chris. That's good

Re: [htdig-dev] Re: Performance issue with exclude_urls

2004-04-21 Thread Gilles Detillieux
According to Christopher Murtagh: On Wed, 2004-04-21 at 09:13, Lachlan Andrew wrote: Yes, I agree that we need a more polished patch for the distribution. I still like my intermediate path: If *any* server blocks or URL blocks are used, then the user takes the performance hit and

Re: [htdig-dev] Solved! - Re: Performance issue with exclude_urls

2004-04-21 Thread Gilles Detillieux
Earlier I wrote: According to Christopher Murtagh: Easy thing to test. I'll give it a try later this week if I can, perhaps tomorrow, and report back. Great. I'll try to get my fix to Regex.cc in by the end of the week too, so it would be great if you could give it a whirl. It would

Re: [htdig-dev] Using an external digger

2004-04-14 Thread Gilles Detillieux
According to Anuradha Ratnaweera: I am `somewhat' new to this list. ;-) And this is a continuation of the following discussion: http://sourceforge.net/mailarchive/message.php?msg_id=7688630 ... We see two possibilities. - To add a feature to htdig so that it can use an external

Re: [htdig-dev] news.txt file gone

2004-02-20 Thread Gilles Detillieux
According to Jesse op den Brouw: when accessing the htdig main website, file main©shtml does not contain the included news©txt file, allthough the string is definitely in the file itself© Can anyone verify this? Yes, there seems to have been a hiccup in the news update script

Re: [htdig-dev] news.txt file gone

2004-02-20 Thread Gilles Detillieux
According to me: Another way might be to do away with main.shtml and news.txt from the maindocs tree altogether, and just have a main.html file, as before the SourceForge days. Only difference is this time, the news section in main.html would be between clear delimiters, and the news-get.sh

Re: [htdig-dev] found possible bug in htsearch

2004-02-19 Thread Gilles Detillieux
According to Christopher Murtagh: On Wed, 2004-02-18 at 17:39, Gilles Detillieux wrote: Hmm. This would suggest that the '/' has a special meaning to your regcomp() function. How about you try adding a / character to the list of special characters in the strchr() call on line 81 of htlib

Re: [htdig-dev] found possible bug in htsearch

2004-02-19 Thread Gilles Detillieux
According to Christopher Murtagh: On Thu, 2004-02-19 at 12:29, Gilles Detillieux wrote: Yes, unless you configure 3.2.0b5 with --disable-shared, then htlib/HtRegex.cc will wind up in the libht.so shared library, not right in the htsearch binary. So, you'd need to install the rebuilt shared

Re: [htdig-dev] found possible bug in htsearch

2004-02-18 Thread Gilles Detillieux
According to Christopher Murtagh: ht://Dig 3.2.0b5-20040125, two almost identical searches via command line (RedHat Enterprise AS). This one: ./htsearch -v -c ../conf/search.conf 'restrict=music;config=search;method=and;sort=score;matchesperpage=8;words=student;page=1;' returns

Re: [htdig-dev] found possible bug in htsearch

2004-02-18 Thread Gilles Detillieux
According to Christopher Murtagh: On Wed, 2004-02-18 at 17:39, Gilles Detillieux wrote: Hmm. This would suggest that the '/' has a special meaning to your regcomp() function. How about you try adding a / character to the list of special characters in the strchr() call on line 81 of htlib

Re: [htdig-dev] ignore_dead_servers (was: Proxy HTTP proposal of change due to possible bug)

2004-02-05 Thread Gilles Detillieux
According to Gabriele Bartolini: We successfully tested all the HTTP protocol related attributes, except ignore_dead_servers: indeed, we have not seen any difference in switching it off/on as when we try to index a host that does not exist, robots.txt retrieval fails and no other document

Re: [htdig-dev] Db path hardcoded into htdig somewhere?

2004-02-05 Thread Gilles Detillieux
According to BOOTH, Nicholas, FM: |'ve tried a couple of variations of compiling the [3.2.0b5] suite, and using a number of configuration options when running ./configure prior to make/make install. In particular I'm trying to govern where the files will end up (I don't want to use

Re: [htdig-dev] Proxy HTTP proposal of change due to possible bug

2004-02-02 Thread Gilles Detillieux
According to Gabriele Bartolini: Il ven, 2004-01-30 alle 17:59, Gilles Detillieux ha scritto: It certainly seems like a bug to me, and your patch seems correct as far as I can tell. However, I don't see the need for supporting a dash or noproxy. With your patch, if you set http_proxy

Re: [htdig-dev] Re: Feature request

2003-12-16 Thread Gilles Detillieux
According to Ben Scott: On Fri, Dec 12, 2003 at 10:26:00PM +, Mike Holderness wrote: Howabout (ab)using the URL rewrite facility? Or does it not work on query strings? Mike I would have to get the webserver to rewrite what it presents to htdig, which would be... nontrivial.

Re: [htdig-dev] parsing pdf-files

2003-11-21 Thread Gilles Detillieux
According to =?iso-8859-1?Q?=22Z=F6llner=2C_Raymond=22?=: I'm using SuSE Linux 8.1, installed ht://Dig 3.2.0b5 for several virtual-webs under tomcat and it works fine. The only problem is to parse pdf-files. When I use ht://Dig 3.2.0b5 I get the error message on digging a pdf-file:

Re: [htdig-dev] Concurrent indexes?

2003-11-21 Thread Gilles Detillieux
According to Christopher Murtagh: In some of my tests, I came across this error: WordDB: CDB___memp_cmpr_read: unable to uncompress page at pgno = 402 WordDB: PANIC: Input/output error DB_RUNRECOVERY: Fatal error, run database recovery and I realize that this might be due to the method

Re: [htdig-dev] Numbered HTML Entities mangled in Result Blurbs

2003-11-21 Thread Gilles Detillieux
According to Neal Richter: On Wed, 19 Nov 2003, Gilles Detillieux wrote: According to Neal Richter: This error is happening in the DISPLAY of the excerpts... so it seems like looking for #XXX; patterns and NOT encoding them before display is a reasonable strategy... the browser

Re: [htdig-dev] Numbered HTML Entities mangled in Result Blurbs

2003-11-19 Thread Gilles Detillieux
According to Neal Richter: This error is happening in the DISPLAY of the excerpts... so it seems like looking for #XXX; patterns and NOT encoding them before display is a reasonable strategy... the browser will decide how to display it. That would be a reasonable compromise, but note that

Re: [htdig-dev] Updating document titles

2003-11-19 Thread Gilles Detillieux
According to Wim Kosten: Using the external converter switch (application/pdf-text/html) I index PDF files which works perfectly however the PDF files (abt, 8000) all have undescriptive titles like Word doc 2 instead of Proposal for the yearly members meeting. In order to properly show

Re: [htdig-dev] Possble Bugs, esp with htdig.conf

2003-11-18 Thread Gilles Detillieux
According to Gilles Detillieux: According to BOOTH, Nicholas, FM: 3] If there is _not_ a return after the last line in the config file then htsearch causes a cgi error. Results from apache eror log: Unknown char in line 224: #[Fri Nov 14 23:51:46 2003] [error] [client 147.114.74.200

Re: [htdig-dev] Numbered HTML Entities mangled in Result Blurbs

2003-11-18 Thread Gilles Detillieux
According to Neal Richter: I am seeing some HTML entities show up in search result 'blurbs'. See below. Basically any entity of this form #XXX; get translated to amp;#XXX; #153; -- amp;#153; This only happens for numbered entities below 160. #160; -- nbsp; #169; -- copy; #174; --

Re: [htdig-dev] 3.2.0b5 Testing

2003-11-17 Thread Gilles Detillieux
According to Joe R. Jah: On Fri, 14 Nov 2003, Gilles Detillieux wrote: endings and synonyms are generated a bit differently, and most importantly are built in a temporary spot then moved into place, so they don't collide with any existing databases until the new one is complete. soundex

Re: [htdig-dev] purge and re-index failing

2003-11-17 Thread Gilles Detillieux
According to Neal Richter: 1) This combination of command line options is not playing well together. No, they won't play well when you don't follow the correct syntax. Thanks Gilles... should we put the '-' stdin option on our list of desired features for 3.2.0??? I was looking

Re: [htdig-dev] purge and re-index failing

2003-11-17 Thread Gilles Detillieux
According to Christopher Murtagh: On Mon, 2003-11-17 at 16:20, Gilles Detillieux wrote: The -m option MUST be followed by a file name, and this file must be a list of one or more URLs to add to the index. The htdig.html page is a tad misleading, as it shows [url_file] in brackets, which

Re: [htdig-dev] robots.txt not working

2003-11-14 Thread Gilles Detillieux
According to Andy Lewis: Look like the robots.txt file isn't being parsed properly. I've used the http://www.jumboclassifieds.com/~alewis/attrs.html#robotstxt_name robotstxt_name tag and added the same name to my robots.txt file and I still see the default htdig name when indexing.

Re: [htdig-dev] 3.2.0b5 Testing

2003-11-14 Thread Gilles Detillieux
According to Joe R. Jah: On Thu, 13 Nov 2003, Gilles Detillieux wrote: According to Joe R. Jah: htfuzzy metaphone dumps core, but it works fine with endings, etc. Just a hunch, but I'd guess that you had a metaphone database left over from 3.1.6, and that the newer DB code in 3.2.0b5

Re: [htdig-dev] Next steps

2003-11-13 Thread Gilles Detillieux
According to Gabriele Bartolini: I am going over the release files right now. I will check that everything is fine and then: 1) update the CVS tag 2) rebuild the package 3) put it: - under the files section of the site - on sourceforge.net 4) publish it on: -

Re: [htdig-dev] Almost there...

2003-11-13 Thread Gilles Detillieux
According to Joe R. Jah: Job well done! It configured/built/ran out of the box on my BSD/OS-4.3.1 with gcc 2.95.3 like a charm; It took only 96 minutes to index my site;) How does this compare to earlier 3.2.0b4 snapshots, and to 3.1.6? Is 3.2.0b5 significantly slower than 3.1 releases, and is

Re: [htdig-dev] 3.2.0b5 Testing

2003-11-13 Thread Gilles Detillieux
According to Joe R. Jah: htfuzzy metaphone dumps core, but it works fine with endings, etc. Just a hunch, but I'd guess that you had a metaphone database left over from 3.1.6, and that the newer DB code in 3.2.0b5 doesn't like it. We had put some tests for this in some of the other programs,

Re: [htdig-dev] Almost there...

2003-11-10 Thread Gilles Detillieux
According to Lachlan Andrew: The current status: - the tar balls have been regenerated using the modified form of Gabriele's script. - maindocs/dev/htdig-3.2 have been updated (although the changes haven't yet been reflected at www.htdig.org, so I'm not sure I did it correctly).

Re: [htdig-dev] Almost there...

2003-11-10 Thread Gilles Detillieux
According to Neal Richter: - Create diff tarballs (or change FAQ 2.5). Joe R Jah? Looks like he maintains a ftp patch download area @ tp://ftp.ccsf.org/htdig-patches/ Yes, his patch site is still there, but that's just a side-note in FAQ 2.5. That's where you can get patches for bug

Re: [htdig-dev] Release of 3.2.0b5 :)

2003-11-09 Thread Gilles Detillieux
According to Gabriele Bartolini: I've updated .version, FAQ.html and README, and written a draft entry for RELESASE.html. I haven't touched the dev/htdig-3.2 files in maindocs, but I've updated the main maindocs FAQ. WELL DONE! Thanks so much, Lachlan! Hear, hear!!! Excellent job.

Re: [htdig-dev] Release of 3.2.0b5 :)

2003-11-09 Thread Gilles Detillieux
What's missing in this script is Geoff's script to fix up all the file permissions. I think this is a must for a release tarball. According to Gabriele Bartolini: just to let you know that I have built the tarball using a very simple script (you can find it at the bottom of this e-mail)

Re: [htdig-dev] Release of 3.2.0b5 :)

2003-11-09 Thread Gilles Detillieux
According to Jesse op den Brouw: there's one thing that puzzles me: where is 3.2.0b4 ??? We decided to skip over it to avoid the confusion caused by 2+ years of 3.2.0b4 snapshots out there, incorporated into RPMs and other packages, as well as a lot of test or even production installations.

Re: [htdig-dev] patch for bug #504087, duplicate URLs in htsearch collections

2003-10-28 Thread Gilles Detillieux
Yesterday, I wrote: I've pretty much run out of time to help out with this release, but before I leave you, I thought I'd submit the following patch for your testing and approval. It should fix the duplicate URL problem in htsearch collections, in bug #504087. I'm not sure what sort of

Re: [htdig-dev] [Task #87784] Parsing

2003-10-27 Thread Gilles Detillieux
According to SourceForge.net: Task #87784 has been updated. Project: ht://Dig Subproject: Testing 3.2 Summary: Parsing Complete: 0% Status: Open Authority : nealr Assigned to: nobody Description: Use the documenation to test each one of these config verbs #parsing

[htdig-dev] patch for bug #504087, duplicate URLs in htsearch collections

2003-10-27 Thread Gilles Detillieux
I've pretty much run out of time to help out with this release, but before I leave you, I thought I'd submit the following patch for your testing and approval. It should fix the duplicate URL problem in htsearch collections, in bug #504087. I'm not sure what sort of performance impact it will

Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-25 Thread Gilles Detillieux
According to Gabriele Bartolini: At 13.45 22/10/2003 -0600, Neal Richter wrote: OK, you've convinced me, it IS useful to have this switch be user controlled.. I wasn't aware of the non-compliant servers causing an issue. Clearly 'automatic' behavior in that case is a bad thing. Go with

Re: [htdig-dev] htsearch 3.2 rejects records with modtime of 0?

2003-10-25 Thread Gilles Detillieux
According to me: According to Lachlan Andrew: I've applied a better patch. The default for startyear is empty, and it is documented that *if* a start/end date is specified then it defaults to 1970. Gilles, could you please verify that this fixes the bug, and close the report?

[htdig-dev] not all 3.1.6 bug fix patches in 3.2 yet

2003-10-25 Thread Gilles Detillieux
Hi, folks! Sorry to throw another curve ball your way, but while checking the 3.2 Display.cc code against my patched 3.1.6 code, I came to the realisation that not all bug fixes for the 3.1.6 code, available at ftp://ftp.ccsf.org/htdig-patches/3.1.6/, and which would still be relevant/applicable

[htdig-dev] htsearch 3.2 rejects records with modtime of 0?

2003-10-21 Thread Gilles Detillieux
Hey, guys. I ran into something wierd when I was testing out the allow_numbers changes last week, which I haven't been quite able to explain or track down in the code. Of the pages on my site that I was indexing, about a dozen of them were from a CGI script that puts out a Last-Modified header

Re: [htdig-dev] head_before_get attribute (was: 3.2RC1 Feature Freeze)

2003-10-20 Thread Gilles Detillieux
According to Gabriele Bartolini: I think what we've had here is informative debate. You as much as anyone else wrote the networking code, so for me it's your decision. I think the new TRUE default is fine. OK. Any other opinions? I think it was just a matter of not understanding what

Re: [htdig-dev] allow_numbers

2003-10-14 Thread Gilles Detillieux
According to Lachlan Andrew: On reflection, I think the behaviour that seems to have been intended is better. I've filed a bug report (with patch) to implement: 1. If allow_numbers is false, words must contain at least one non-digit (2001 not a word, X11 is). 2. If allow_numbers is true,

Re: [htdig-dev] Re: [htdig-updates] htdig update

2003-10-08 Thread Gilles Detillieux
According to Jim Cole: The patch I provided is apparently not a complete fix. Gilles shot it down, pointing out that it misses at least one case. He offered an alternative patch, but asked for feedback before committing it. I don't think the person that initially reported the problem ever

Re: [htdig-dev] Re: Logical Error in Indexer???

2003-10-07 Thread Gilles Detillieux
According to Gabriele Bartolini: Nope, if head_before_get=TRUE we use the HEAD request and the HTTP server is kind enough to give us the timestamp on the document in the header. If the timestamps are the same we don't bother to download it. Yep, you are right. I remember that was one of

Re: [htdig-dev] Re: Logical Error in Indexer???

2003-10-04 Thread Gilles Detillieux
According to Neal Richter: On Fri, 3 Oct 2003, Lachlan Andrew wrote: I'm not sure that I understand this. If a page 'X' is linked only by a page 'Y' which isn't changed since the previous dig, do we parse the unchanged page 'Y'? If so, why not run htdig -i? If not, how do we know that

Re: [htdig-dev] Configuration patch

2003-08-27 Thread Gilles Detillieux
According to Gabriele Bartolini: I don't know why it was broken, whereas when I did it at commit time it worked for me; it's just that before leaving for Australia I had to set up many things and I lost all the contents of my hard drive 2 days before taking off!!! For sure I forgot to

Re: [htdig-dev] Configuration patch

2003-07-30 Thread Gilles Detillieux
According to Jim Cole: Hi - As someone already mentioned with regard to their Red Hat install, the configuration changes to the 3.2 code base require auto* tools more recent than those provided with the distribution. The same is true for OS X, which currently provides autoconf 2.52 and

Re: [htdig-dev] Accessible version for ht:/dig: AAA- WAI

2003-07-17 Thread Gilles Detillieux
According to Gabriele Bartolini: If you have some code to contribute, I'd love to hack the main CVS tree and make your changes available in the next release (hopefully soon!). Main changes I was thinking about regard mainly images and forms (with proper labels). ... Il mer,

Re: [htdig-dev] htdig-3.2.0b4-20030622 bug in robots.txt processing

2003-07-08 Thread Gilles Detillieux
According to Jim Cole: On Friday, June 27, 2003, at 02:00 PM, Patrick Robinson wrote: I just installed htdig-3.2.0b4-20030622, and discovered that it's not correctly handling Disallow: patterns from my robots.txt file. (I'm hoping this is the correct list to post this!) I have these

Re: [htdig-dev] Re: Feedback on configuring ht://Dig?

2003-06-13 Thread Gilles Detillieux
. :) On 06/12/03 15:29, Gilles Detillieux wrote: They point it to /var/www/html/htdig. Actually, they do lots of really dumb things in their RPM package. First of all they set common_dir and image_dir to the same thing, then they move /var/www/html/htdig to /usr/share/htdig. In previous RPMs

Re: [htdig-dev] MATCH_MESSAGE

2003-06-12 Thread Gilles Detillieux
According to Lachlan Andrew: I've noticed that MATCH_MESSAGE is not internationalised. It is set to all or some, rather than the values in method_names. Moreover, if method_names doesn't use names and and or, then it isn't set at all. Does anyone mind if I change the behaviour

Re: [htdig-dev] Re: Feedback on configuring ht://Dig?

2003-06-12 Thread Gilles Detillieux
According to Lachlan Andrew: I'm sorry, I misread your bug report. I take it that common_dir *wasn't* /usr/share/htdig in the RedHat RPM, but that it should have been. If the packagers configured it with --prefix=/usr (which the package information says they did), then the default

Re: [htdig-dev] Re: Red Hat

2003-06-12 Thread Gilles Detillieux
As far as I know, Apache has always allowed putting all access.conf and srm.conf directives right in httpd.conf. The 3 file setup was a carryover from NCSA httpd, but I don't think Apache 1.x ever enforced any context-specific stuff in any of the files. With Apache 2, they've done away with the

Re: [htdig-dev] 3.2.0b4 Progress check :)

2003-06-08 Thread Gilles Detillieux
According to Lachlan Andrew: Gilles: How is the Latin encodings going? If it is simply a forward port, I can try it. If you want new features, I'll leave it to you. It's not a forward port, as the SGML decoding is quite different in 3.2. I was busy the past couple weeks with a server

Re: [htdig-dev] 3.2.0b4 Progress check :)

2003-06-08 Thread Gilles Detillieux
According to Lachlan Andrew: What docs need to be updated? The www.htdig.org FAQ is much bigger than the distribution FAQ. Should it be copied over (even though a lot of it relates to 3.1)? Geoff has a pre-release check list he uses to determine which docs are forward ported or back ported

Re: [htdig-dev] Current Status as of snapshot 3.2.0b4-20030525

2003-05-29 Thread Gilles Detillieux
According to Neal Richter: Question: I've got a bunch of native win32 support changes I've been sitting on (and am now reverifying). Should I sit on them until we release 3.2.0b5?? What about fixes for memory leaks??? I'm back to a period here at work where I can work full-time on

Re: [htdig-dev] Re: Importing cookies

2003-02-21 Thread Gilles Detillieux
According to Gabriele Bartolini: been set to a new HtCookieInFileJar object, which doesn't get deleted. Shouldn't the delete cookie_file statement be moved outside of the innermost if clause, and past the end of the else clause? It is deleted later by the main function, through the base

Re: [htdig-dev] Re: Importing cookies

2003-02-21 Thread Gilles Detillieux
According to Gabriele Bartolini: Here is the description for cookies_input_file: Set the input file to be used when importing cookies for the crawl; cookies must be specified according to Netscape's format (tab separated fields). For more information, give a look at the example cookies

Re: [htdig-dev] Problem with illegal instruction with Version 3.2.0.b3 on AIX 5.1L (RS6000)

2003-02-19 Thread Gilles Detillieux
According to Martin Laarz: I try to compile the 3.2.0.b3 Version on an AIX 5.1L Partition using gcc 2.9-aix51-020209 on a RS6000 (IBM Regatta). What i need from the newer Version is the phrases feature. During the compilation phase i got theese warnings: ... At least I got my binaries,

Re: [htdig-dev] Re: Residual database errors

2003-02-18 Thread Gilles Detillieux
According to Neal Richter: Question: Your message below points to an error on page 26613. Your previous message pointed to an error on page 33. Is the error a moving target? ;-) I think I reduced the data set slightly, but yes, the target does seem to move :( As the attached file

Re: [htdig-dev] HTTP-header-line may not be parsed correctly

2003-02-14 Thread Gilles Detillieux
While I was waiting for the other shoe to drop, Frank Passek wrote: I encountered the following problem with the versions 3.1.6 and 3.2.0.b3. htdig cannot parse HTTP-header lines when there is no blank after the colon, as in content-type:text/html This problem may be solved by replacing

Re: [htdig-dev] Re: ssl and 3.2.0b4-20021110

2003-01-14 Thread Gilles Detillieux
According to Budd, Sinclair: Hello After much scrambling, I found the script handler.pl by Geoffrey Hutchison [EMAIL PROTECTED] which is used with external_protocols to fetch https pages. IT WORKS A WONDER. There is a small documentation issue . In attrs.html under

Re: Fwd: [htdig-dev] defaults.cc

2003-01-10 Thread Gilles Detillieux
According to Geoff Hutchison: It's probably my fault if this didn't end up in the CVS. I'll take a look later this morning. But AFAIK, this is the latest version of the defaults.xml builder. Brian had also posted earlier patches and scripts, which were archived here:

Re: [htdig-dev] Re: splitting attrs.html

2003-01-09 Thread Gilles Detillieux
According to Ted Stresen-Reuter: I've reviewed, briefly, both files mentioned. I'm not sure what they're purpose is in terms of the whole project (I've been following the discussion, but not closely - seems like these are default values for every attribute based on the name), but it seems

Re: [htdig-dev] Re: Forward porting changes for 3.1.6 to 3.2.0b4

2002-11-28 Thread Gilles Detillieux
According to Lachlan Andrew: Greetings, On Wed, 27 Nov 2002 15:53, Gilles Detillieux wrote: According to Geoff Hutchison: Is the policy to have all possible stemmings, even if they are non-words, like unrealises? No, and I'd expect that ispell doesn't want them either

Re: [htdig-dev] Re: Forward porting changes for 3.1.6 to 3.2.0b4

2002-11-26 Thread Gilles Detillieux
According to Geoff Hutchison: Is the policy to have all possible stemmings, even if they are non-words, like unrealises? If so, we can really go to town on the affixes :) No, and I'd expect that ispell doesn't want them either. Of course many people have moved away from ispell too...

Re: [htdig-dev] ht://dig feature request...

2002-11-22 Thread Gilles Detillieux
According to Carl Holtje: Hoping this is a good place to post this... As a configuration option, available to pages like the footer or header, I'd like to be able to put the modification date of one of the database files... I'm not completely convinced the full time is necessary, but at

Re: [htdig-dev] Questions

2002-11-07 Thread Gilles Detillieux
According to Lachlan Andrew: - What is the difference between Dictionary::Destroy() and Dictionary::Release() ? Dictionary entries associate a particular name or keyword with a pointer to an Object. Generally, but not necessarily always, when adding items to a Dictionary, you allocate a

Re: [htdig-dev] Base URL problem

2002-11-04 Thread Gilles Detillieux
because I didn't ask him immediately the version he's using - this bug could already be solved). As Hans reported, a newer snapshot did solve the problem. I suspect this set of changes from just over a year ago was the cure... Fri Sep 14 22:12:56 2001 Gilles Detillieux [EMAIL PROTECTED

Re: [htdig-dev] db.words.db using only key and empty value?

2002-11-02 Thread Gilles Detillieux
According to Neal Richter: On Tue, 22 Oct 2002, Geoff Hutchison wrote: On Tuesday, October 22, 2002, at 02:50 PM, Neal Richter wrote: It looks to me like the db.words.db is using only a 'key' value, and has a blank 'value' for each and every key! Nope. Remember that value as it

Re: 3.2.0b4 release [htdig-dev] (was 3.2 Stability)

2002-10-30 Thread Gilles Detillieux
According to Jim Cole: On Friday, October 18, 2002, at 02:12 PM, Gilles Detillieux wrote: Well, since you're offering, I wouldn't mind a hand in going through the 3.1.6 ChangeLog to see what other changes need to go in 3.2 still. I'd like to add to the list above to make it a complete list

Re: 3.2.0b4 release [htdig-dev] (was 3.2 Stability)

2002-10-18 Thread Gilles Detillieux
be able to trust that it's actually the official 3.2.0b4 release? Would it be helpful to skip b4 and jump to b5? On Thu, 17 Oct 2002, Gilles Detillieux wrote: 3) My own lack of time in being able to get the 3.1.6 fixes/updates forward ported to 3.2. If you have a list of particular things

Re: [htdig-dev] Re: 3.2 Stability

2002-10-17 Thread Gilles Detillieux
According to Neal Richter: My previous post with the proposed schedule could be restated with the releases as 3.2beta4, 3.2beta5, 3.2beta6 etc. I guess it comes down to that I think the code is good enough now to consider a release in the near-term without a raft of changes/improvements.

Re: [htdig-dev] patch for argument handling in htsearch

2002-10-04 Thread Gilles Detillieux
According to Lachlan Andrew: I've almost finished some patches trying to address the htsearch input parameters issue (below). I've updated defaults.cc to list all variables used by any of the programs (according to grep config), and described them as best I can. Where they are

Re: [htdig-dev] XML version of defaults.cc

2002-10-04 Thread Gilles Detillieux
According to Geoff Hutchison: OK, I saw that on my TODO list today and added a defaults.xml to the mainline CVS. It's also gzip'ed and attached to this e-mail. It's a first draft, so to speak and I'm not entirely sure it's all well-formed XML or that the DTD is set. So please make

[htdig-dev] Re: [ htdig-Bugs-614270 ] Scoring

2002-10-01 Thread Gilles Detillieux
] -- Comment By: Gilles Detillieux (grdetil) Date: 2002-09-25 06:57 Message: Logged In: YES user_id=149687 It's not a bug, it's a lack of a feature. Doing just what you propose has been suggested before, and it will eventually find its

  1   2   3   >