Fuad Efendi wrote:

Andrzej,


I am trying to restore human-oriented web-site tree using anchor text! As a
samle, page with anchor text "Motherboards" has many linked pages with
concrete motherboards, etc; we can group information in many cases.

Anchor text is the true subject of the page, but within same domain. BTW,

Well, as your original observation points out this is not always the case - but this is more a topic for a philosophical debate about what is the truth...

some pages have <META name="keywords" content="...">, and Nutch doesn't
handle it.

Nutch does handle META tags up to a point, i.e. they are correctly processed in parse-html, and then passed to all HtmlParseFilters - and it's up to you what you want to do with them; you can put them into parseData.metadata, you can later on index them, etc... but by default Nutch doesn't process them any further.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to