Geoff Hutchison writes:
> I'm curious about the refactoring. Did you get things to your
> satisfaction? If so, how much did you end up changing? If not, what
> questions do you have?
I worked on it for 3 or 4 solid days a week ago, but since then I was
at the Python9 conference in Long Beach. I will get back to it
tomorrow.
Here is my progress so far (mostly in CVS in my experimental branch):
I have separated out the logic for retrieving documents from the
Document class into several specialized classes called ExternalSource,
FileSource, NNTPSource, and Server (which would otherwise be known as
HTTPSource). These are all derived from an abstract base class called
Source.
Meanwhile I am moving the code for processing a retrieved document
into a separate facility that I am calling a Tallier. This class is
responsible for counting word frequencies, storing the needed info in
the words databases, etc. This will be the class that has most of the
Parsable callbacks like got_word, got_title, etc. Other types of
Tallier would in principle be able to be plugged in or used in
combination with the existing ones.
Finally I will create a class called a Crawler (or maybe I'll use
Retriever) which coordinates the traversal of the doc tree. Its only
callback from the Parsable will be got_href, which obviously it needs
in order to continue the crawl.
One of the goals of this refactoring is that the Tallier class can be
used independently of the crawling logic; for example, files could be
pushed into the Tallier from any random program. (This feature is
needed by one of my clients.)
I branched off of htdig-3-2-0-b3 because there were too many conflicts
between that and the main branch. I've made extensive changes already
in the htdig subdirectory and to a lesser extent in the htlib
subdirectory. I'm not looking forward to when it comes time to merge
my work back into the main branch, especially if other people have
been working much in this subdirectory.
I hope to have something to share within the next week or so--before
my next trip :-|
Comments are welcome.
Michael
--
Michael Haggerty
[EMAIL PROTECTED]
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev