Hi, i'm pretty new to Nutch and i'm trying to modify the code so it stores the words before and after a hyperlink as well as the anchor text. i've ben looking through the nutch code for a couple of days and i'm still a little unclear as to the layout... Nutch parses incoming webpages in HTMLParser.java right? i can't seem to find the code in here for url processing though - where exactly does it parse the anchor text and write it to the database?
any help greatly appreciated! Brian