My personal fav. list
In a day or so I will count all votes and post them.

NUTCH-141       jobdetails.jsp doesnt work on webbrowser "safari"
+1
NUTCH-140 Add alias capability in parse-plugins.xml file that allows mimeType->extensionId mapping
NUTCH-139       Standard metadata property names in the ParseData metadata
+1
NUTCH-138       non-Latin-1 characters cannot be submitted for search
+1
NUTCH-137       footer is not displayed in search result page   
NUTCH-136 mapreduce segment generator generates 50 % less than excepted urls
NUTCH-34        Parsing different content formats       
NUTCH-3 multi values of header discarded        
+1
NUTCH-134       Summarizer doesn't select the best snippets     
NUTCH-132       Add ability to sort on more than one column     
NUTCH-131       Non-documented variable: mapred.child.heap.size
NUTCH-98        RobotRulesParser interprets robots.txt incorrectly
NUTCH-129 rtf-parser does not work when opened with wordpad files and saved
NUTCH-120       one "bad" link on a page kills parsing        
+1
NUTCH-128       second configuration nodes overwrites first node
NUTCH-127       uncorrect values using -du, or ls does not return items
NUTCH-126       Fetching via https does not work with a proxy (patch)
+1
NUTCH-125       OpenOffice Parser plugin        
+1
NUTCH-110       OpenSearchServlet outputs illegal xml characters
+1
NUTCH-36        Chinese in Nutch        
NUTCH-123       Cache.jsp some times generate NullPointerException
+1 (may already fixed)
NUTCH-39        pagination in search result     
NUTCH-49 Flag for generate to fetch only new pages to complement the -refetchonly flag
NUTCH-94        MapFile.Writer throwing 'File exists error'.    
NUTCH-117 Crawl crashes with java.io.IOException: already exists: C: \nutch\crawl.intranet\oct18\db\webdb.new\pagesByURL
NUTCH-122       block numbers need a better random number generator
NUTCH-82        Nutch Commands should run on Windows without external tools
NUTCH-121       SegmentReader for mapred        
NUTCH-119       Regexp to extract outlinks incorrect    
+1
NUTCH-118       FAQ link points to invalid URL  
NUTCH-115       jobtracker.jsp shows too much information       
NUTCH-103       Vivisimo like treeview and url redirect 
NUTCH-108       tasktracker crashs when reconnecting to a new jobtracker.
NUTCH-113       Disable permanent DNS-to-IP caching for JVM 1.4
NUTCH-111 ndfs.replication is not documented within the nutch- default.xml configuration file.
NUTCH-100       New plugin urlfilter-db
+1
        
NUTCH-101       RobotRulesParser        
NUTCH-96 MapFile.Writer throws directory exists exception if run multiple times in the same JVM or server JVM.
NUTCH-106       Datanode corruption     
NUTCH-105 Network error during robots.txt fetch causes file to be ignored NUTCH-104 Nutch query parser does not support CJK bi-gram segmentation.
NUTCH-102       jobtracker does not start when webapps is in src
NUTCH-95        DeleteDuplicates depends on the order of input segments
NUTCH-92        DistributedSearch incorrectly scores results    
NUTCH-87        Efficient site-specific crawling for a large number of sites
NUTCH-91        empty encoding causes exception
+1
        
NUTCH-90        reduce logging output of IndexSegment   
NUTCH-52        Parser plugin for MS Excel files        
NUTCH-86        LanguageIdentifier API enhancements     
NUTCH-84        Fetcher for constrained crawls  
NUTCH-74        French Analyzer Plugin
+1
        
NUTCH-83        Release deliverable as zip      
NUTCH-81        Webapp only works when deployed in root 
NUTCH-79        Fault tolerant searching.       
NUTCH-64 no results after a restart of a search--server (without tomcat restart)
NUTCH-76        NDFS DataNode advertises localhost as it's address
NUTCH-75 Patch for WebDBReader to get more detailed information about WebDBs
NUTCH-73        A page for CSV results  
NUTCH-72        Query basic filter with correction feature      
NUTCH-70        duplicate pages - virtual hosts in db.  
NUTCH-68        A tool to generate arbitrary fetchlists 
+1
NUTCH-62 Add html META tag information into metaData in index-more plugin
++1!
NUTCH-61        Adaptive re-fetch interval. Detecting umodified content
++1! but is it ready to us?
NUTCH-55 Create dmoz.org search plugin - incorporate the dmoz.org title/category/description if available &
NUTCH-59        meta data support in webdb      
NUTCH-25        needs 'character encoding' detector     
NUTCH-44        too many search results 
NUTCH-42        enhance search.jsp such that it can also returns XML
NUTCH-50        Benchmarks & Performance goals      
NUTCH-13        If dns points to 127.0.0.1, the url is also crawled
NUTCH-48        "Did you mean" query enhancement/refignment feature request
+1
NUTCH-47        Configure host filter to do wildcard prefixes - *.redhat.com
NUTCH-45        Log corrupt segments in SegmentMergeTool        
NUTCH-26        New Http Authentication mechanism       
NUTCH-24        Cannot handle incorrectly cased Content-Type    
NUTCH-23        content text/xml parser 
NUTCH-18        Windows servers include illegal characters in URLs
NUTCH-16        boost documents matching a url pattern  
NUTCH-14        NullPointerException NutchBean.getSummary       
NUTCH-12        WebDBReader options to print incoming links


Reply via email to