I think no. But you could write one, just take a look to nutch api.
On 7/25/06, Aaron Tang <[EMAIL PROTECTED]> wrote:
Is there any nutch api can do this? -----Original Message----- From: Lourival Júnior [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 26, 2006 1:41 AM To: [email protected] Subject: Re: How can i get a page content or parse data by the page's url If I'm not wrong you can´t do this. The segread command only accept these arguments: SegmentReader [-fix] [-dump] [-dumpsort] [-list] [-nocontent] [-noparsedata] [-noparsetext] (-dir segments | seg1 seg2 ...) NOTE: at least one segment dir name is required, or '-dir' option. -fix automatically fix corrupted segments -dump dump segment data in human-readable format -dumpsort dump segment data in human-readable format, sorted by URL -list print useful information about segments -nocontent ignore content data -noparsedata ignore parse_data data -nocontent ignore parse_text data -dir segments directory containing multiple segments seg1 seg2 ... segment directories On 7/25/06, Aaron Tang <[EMAIL PROTECTED]> wrote: > > Hi all, > > How can i get a page content or parse data by the page's url. > Just like the command: > > $ bin/nutch segread crawl/segments/20060725213636/ -dump > > will dump pages in the segment. > > I'm using nutch 0.7.2 on cygwin under winxp. > > Thanks! > > Aaron > > -- Lourival Junior Universidade Federal do Pará Curso de Bacharelado em Sistemas de Informação http://www.ufpa.br/cbsi Msn: [EMAIL PROTECTED]
-- Lourival Junior Universidade Federal do Pará Curso de Bacharelado em Sistemas de Informação http://www.ufpa.br/cbsi Msn: [EMAIL PROTECTED]
