Hello,
To change nutch standard html parsing the best place to start would be probably parse-html plugin.
Regards
Piotr
Fuad Efendi wrote:
1. This is part of ParseText:
Any Accessories Backup Devices & Media Barebone Systems Camcorder
Accessories Camcorders Cases & External Enclosures CD / DVD Drives &
Media Cooling Devices Digital Camera Accessories Digital Cameras

- it is content of Dropdown, <OPTIONS> in HTML


2. I have some sub-text in ParseText which seems to be an anchor, I
compared visually with web-page...


-----Original Message-----
From: Fuad Efendi [mailto:[EMAIL PROTECTED] Sent: Monday, August 15, 2005 1:20 PM
To: [email protected]
Subject: Fetcher, ParseText, ParseData - need to modify


I just catched some output from Fetcher.FetcherThread.outputPage(.) and
noticed that some anchors are in a text, and some <OPTIONS> tags within
a text too.
          LOG.info("ParseText = "+text);
          LOG.info("ParseData = "+ parseData);

I'd like to modify behaviour, ParseText should contain subset of a text
which I need, and ParseData should contain all anchors.

Where to start? Would be nice to have plugins modifying Fetcher
behaviour...



Reply via email to