Hi All,

             I am new and working on Apache Nutch to crawl some sites , filter 
and get content on the base of word not on the base of url. e.g.


  1.  I have to crawl those sites  that contain word like 'shop'  or 'product' 
in contents(text). if this word not exists then not crawl further links.
  2.  I want to get structured (json fields e.g text , url , metadata etc.) 
data instead of unstructured(whole page source) data.

any little help be appreciable.

Regards
Muhammad umer

Reply via email to