Hi,

I was trying to modify the files under src/plugin/parse-html in order to
make it possible for nutch to index iamges(gif, jpeg, bmp, etc).
I've injected some urls to some gif files and,after commenting
the lines

//if (!"".equals(contentType) && !contentType.startsWith("text/html"))
// throw new ParseException("Content-Type not text/html: " + contentType);


I get the files indexed.

Ok, I know it's bad. But it's just a start.
I'm trying to index only the urls so nutch can search on the iamge name at least.


But, the problem is how to follow <img src=> urls.
 I've tried to add a new line here:


public static HashMap linkParams = new HashMap();

 static {
     linkParams.put("a", new LinkParams("a", "href", 1));
     linkParams.put("img", new LinkParams("img", "src", 1));

but it didn't work.

My goal was to make nutch search for iamges, the way google does (nearly).
So that parsing the iamge file won't be needed. Just indexing the iamge name,
the page content, alt tags, etc.


Any suggestion? Help? Please, I would appreciate.

Thanks!
Marco

_________________________________________________________________
MSN Messenger: instale gr�tis e converse com seus amigos. http://messenger.msn.com.br




Reply via email to