What i mean is that i have a list that already contains URLs and teasers/descriptions
and keywords. In that way i can associate a specific keywords with a certain URL. I
just want to make a simple index with that data.
To day i use software form Fast (www.alltheweb.com) to do this, but i would like to
see if it works with some GNU software like Htdig.
The files Something like:
http://www.somesite.com;Description of the site;Keyword1,Keywords2,Keyword3
http://www.somesite2.com;Description of the site;Keyword1,Keywords2,Keyword3
etc.
It would also be great if XML format could be used as input.
-T
----- Ursprungligt meddelande -----
Fr�n: "Geoff Hutchison" <[EMAIL PROTECTED]>
Till: "Torbj�rn Gylleus" <[EMAIL PROTECTED]>
Kopia: <[EMAIL PROTECTED]>
Skickat: den 4 maj 2001 18:03
�mne: Re: [htdig] Data import
On Fri, 4 May 2001, Torbj�rn Gylleus wrote:
> I have been testing the crawler function of htdig and it seem verry nice. I
> also need to make indexes of structured data without fetching it by http,
> such as plain text files and XML data with url and description.
I'm not quite sure what you mean. It sounds like you have two different
questions:
1) Can you index data without fetching via HTTP: Yes and no. In the 3.1.x
releases, it assumes that everything is fetched via HTTP, though you can
"override" this with the local_urls
attributes. <http://www.htdig.org/attrs.html#local_urls> In the 3.2 code,
you can use more than just http:// URLs, such as file:// or you can define
your own "external transport" handler for whatever method you want.
2) Can you index more than just HTML files? Yes. Plain text files are
treated automatically. XML files are probably treated as plain text unless
you have specified an alternate external parser or converter. If you have
a simple XML schema, you can probably write your own parser or converter
script.
<http://www.htdig.org/attrs.html#external_parsers>
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html