According to Andrew Daviel:
> I have been working on a geographic search engine, and as I mentioned in
> my earlier "features" message, I am getting fed up with my Perl robot, and
> am trying to use htdig.
> 
> I have been somewhat successful and have a version without map
> navigation  at http://geotags.com/htdig/
> 
> My question is really how much support, if any, I might get from the htdig
> community for this, either as mainstream htdig or from anyone else
> interested in this kind of thing.

My question would be how much support do you need.  Right now, all the
developers are pretty strapped for time, so you probably won't get a lot
of development work done for you.  However, if you want to make changes
to the C++ code yourself, I'm sure Geoff, myself, or a few others can
suggest the best approaches and where to put in the changes.  If they're
done in a general enough way, they could even be incorporated into the
3.2 development code.

> The changes required are (so far)
> 
> A database element to store a position for each page
>   (I actually store a region code and placename too but don't use them)

This is one of the more involved changes, but certainly feasible.  It would
require extending the DocumentRef and DocumentDB classes in htcommon, to
handle the new field.  To maintain compatibility, you'd want to add the
new field code to the end of the enumeration in DocumentRef, to avoid
shifting over the other codes.

> An addition to the HTML parser to get the metadata

Should be quite easy.  This would likely involve both the HTML and
Retriever classes in htdig.

> An addition to the CGI parser to get a requested position (map click)

I don't know what that would involve, but it might be possible with a
front-end wrapper script for htsearch.

> A weighting algorithm to calculate geographic distance

You'd need to work out the specifics of the calculations.  Right now,
the scoring is done in Display::buildMatchList() in htsearch, but this
code may get reorganized in the next month or two (in the 3.2 betas).

> I also have a config item to essentially force a ROBOTS NONE if there
> is no geographic tag on a page, so that I can refrain from indexing
> untagged pages.

Code to support this can go in the HTML class.  Adding config attributes
is pretty easy in 3.2, as everything for defining and documenting them
goes into htcommon/defaults.cc.  (Lots of examples to choose from in there!)

> I am also trying to add support for position passed in an experimental
> HTTP header, which allows one to dispense with the map and potentially
> generate requests based on current position automatically, e.g.
> using GPS.

This would affect the HtHTTP and Document classes in htdig, as well as
maybe the Retriever class.

Sounds like an interesting project.  I hope you have a C++ programmer to
help you get the changes into the code.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to