Re: Looking for a gatherer.

2001-01-22 Thread Krishna N. Jha

Look into Pavuk, webBase, wget, or Larbin. They might satisfy most of
the requirements (though not all) mentioned here.

Hope it helps,
Krishna Jha
Bhasha Inc
PS: actually, this is a repeat of an earlier posting; Nick - is it time
for a InternetRobots FAQ here ?

Jim MacDiarmid wrote:

 Is there anything like this that would run on a Windows 98 or NT platform?

  Jim MacDiarmid, Senior Software Engineer
  PACEL Corp.
  8870 Rixlew Lane
  Manassas, VA 20109
  (703) 257-4759
  FAX:  (703) 361-6706
  www.pacel.com
 
 
 
  -Original Message-
  From: Simon Wilkinson [SMTP:[EMAIL PROTECTED]]
  Sent: Sunday, January 14, 2001 4:37 PM
  To:   [EMAIL PROTECTED]
  Subject:  Re: Looking for a gatherer.
 
   I am looking for a spider/gatherer with the following characteristics:
   * Enables the control of the crawling process by URL
  substring/regexp
   and HTML context of the link.
   * Enables the control of the gathering (i.e. saving) processes by
  URL
   substring/regexp, MIME type, other header information and ideally by
  some
   predicates on the HTML source.
   * Some way to save page/document metadata, ideally in a database.
   * Freeware, shareware or otherwise inexpensive would be nice.
 
  You might like to take a look at Harvest-NG, which is free software.
  (http://webharvest.sourceforge.net/ng) It will allow all of what you
  detail above. It saves the metadata in a Perl DBM database - some work
  has been done, but not completed, on working with the DBI interface
  to a remote database. You may find that some knowledge of Perl is helpful
  in adapting it exactly to your needs (much use is made of Perl regular
  expressions in the pattern matching, for instance).
 
  Cheers,
 
  Simon.




Re: Looking for a gatherer.

2001-01-10 Thread Otis Gospodnetic

Add Larbin to that list.

--- Krishna N. Jha [EMAIL PROTECTED] wrote:
 Look into webBase, pavuk, wget - there are some
 other similar free
 products out there.
 (I am not sure I fully understand/appreciate all
 your requirements,
 though; if you wish, you can clarify them to me.)
 We also have web-crawlers which offer more
 flexibility - but are not
 free.

 Hope that helps,
 Krishna Jha

 Mark Friedman wrote:
 
  I am looking for a spider/gatherer with the
 following characteristics:
 
 * Enables the control of the crawling process
 by URL
   substring/regexp and HTML context of the
 link.
 * Enables the control of the gathering (i.e.
 saving) processes by
   URL substring/regexp, MIME type, other header
 information and
   ideally by some predicates on the HTML
 source.
 * Some way to save page/document metadata,
 ideally in a database.
 * Freeware, shareware or otherwise inexpensive
 would be nice.
 
  Thanks in advance for any help.
 
  -Mark


__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/