Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by KenKrugler:
http://wiki.apache.org/nutch/ApacheConUs2009MeetUp

------------------------------------------------------------------------------
- We're planning to have a "Web Crawler Developer" !MeetUp at this year's 
ApacheCon US in Oakland.
+ We're planning to have a "Web Crawler Developer" !MeetUp at this year's 
[http://www.us.apachecon.com/c/acus2009/ ApacheCon US] in Oakland.
  
  Tentative plan is for Thursday evening, November 5th. The actual schedule for 
!MeetUps is [http://wiki.apache.org/apachecon/ApacheMeetupsUs09 here].
  
@@ -11, +11 @@

   * Politeness vs. efficiency - various options for how to be considered 
polite, while still crawling quickly.
   * robots.txt processing - current problems with existing implementations
   * Avoiding crawler traps - link farms, honeypots, etc.
-  * Parsing content - home grown, Neko/TagSoup, Tika, screen scraping
+  * Parsing content - home grown, Neko/!TagSoup, Tika, screen scraping
   * Search infrastructure - options for serving up crawl results (Nutch, Solr, 
Katta, others?)
   * Testing challenges - is it possible to unit test a crawler?
   * Fuzzy classification - mime-type, charset, language.

Reply via email to