Re: [htdig-dev] Retriever/Parser

Geoff Hutchison Wed, 06 Feb 2002 19:24:17 -0800

I'm going to take your points in a slightly different order and so I 
apologize for those following the thread.

At 6:54 PM -0700 1/29/02, Neal Richter wrote:
>       One could make an argument that mifluz could be used directly for
>this.  Very true, but mifluz is a bucket of nice parts.  Htdig is a
>working tool with the wrappers that make mifluz usefull quickly.

I'm not sure Loic ever saw it that way, but that's another story. He 
certainly was looking for very similar things in terms of using 
ht://Dig in other contexts. But we also haven't heard how his designs 
have changed over the years.

>       At some point the Retriever-as-swiss-army-knife approach can be
>overly complex.  A more basic class for optional use can be good for a
>narrow set of uses.

Fair enough, but again, this sounds more like a refactoring of htdig. 
Michael Haggerty worked on some things that may or may not be of 
interest (I haven't seen everything he did to htdig/ myself). If you 
have the CVS repository around, you should check the 
mrh-refactor-htdig branch. If not, let me know and I'll pull together 
a .tar.gz of that.

Can we come up with other types of Retriever classes beyond the 
"here's a document in memory, index it" and "here's a URL, fetch it, 
check status, index and spider" approaches?

>       For this project, all I really store as a 'URL' is part of the
>path to an XML file.. so by itself the URL is useless to any transport
>object.  For that matter you could use URL simply as a document-id in
>another separate system.
...
>       Similarly the query process is integrated inside another
>UI.  A Query is received via user input, passed to htdig search APIs and
>the results are repackaged with in the existing UI.

I'm curious about the URL from the search results aspect of it. 
Indeed the 3.2 code relies very little on the URL for indexing as 
everything is keyed by DocID. This is in contrast to 3.1 and prior 
where the URL was the key to the document database.

What do you present in the search results? How does a user select a 
particular document--is it a link to fetch the document based on the 
DocID? This may help for the people who've asked if htdig could not 
only fetch the document but leave a local copy, a la the Google 
"Cached Results" feature.

-Geoff

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Re: [htdig-dev] Retriever/Parser

Reply via email to