On Fri, 4 May 2001, Torbj�rn Gylleus wrote:
> I have been testing the crawler function of htdig and it seem verry nice. I
> also need to make indexes of structured data without fetching it by http,
> such as plain text files and XML data with url and description.
I'm not quite sure what you mean. It sounds like you have two different
questions:
1) Can you index data without fetching via HTTP: Yes and no. In the 3.1.x
releases, it assumes that everything is fetched via HTTP, though you can
"override" this with the local_urls
attributes. <http://www.htdig.org/attrs.html#local_urls> In the 3.2 code,
you can use more than just http:// URLs, such as file:// or you can define
your own "external transport" handler for whatever method you want.
2) Can you index more than just HTML files? Yes. Plain text files are
treated automatically. XML files are probably treated as plain text unless
you have specified an alternate external parser or converter. If you have
a simple XML schema, you can probably write your own parser or converter
script.
<http://www.htdig.org/attrs.html#external_parsers>
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html