Re: [htdig] How to read HTML file tag attribute

Gilles Detillieux Thu, 14 Jun 2001 13:30:20 -0700
According to zheng hong:
> I get questions about the htdig reading the HTML web page. Once we read 
> the HTML file, we need to recognize the HTML tag, after reading these 
> tags, we can get our strings and words, these words belong to title, 
> head or contents etc. At this moment, we only get these words from the 
> tags, but we can't get these kind of words which have some attributes 
> which like FONT SIZE, ALIGNMENT, STYLE(bold or italica). My questions 
> are 1) Can we attach these tag attribute to the words after we reterive 
> the string from the tag nest? 2) Can we separate the title words and 
> contents words according to these HTML tag attribute?

I'm having a great deal of difficulty understanding what you are asking.
For question 1, do you mean you want the document excerpts in search
results to keep their formatting tags from the original document?  If so,
I'm afraid there's no easy way to modify htdig and htsearch to do this.

For question 2, htdig does make a distinction between words found in
between <title> and </title> tags, words inside link descriptions,
words inside meta keyword tags, and just plain old text in the body
of the document.  There are various *_factor attributes that let you
control how much weight words in these different contexts will carry
in the scoring of search results.

If what you are asking for either of these questions is something
different than I've interpreted, please restate your questions more
clearly and precisely.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
Re: [htdig] How to read HTML file tag attribute

Reply via email to