Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by KurosakaTeruhiko: http://wiki.apache.org/nutch/Features ------------------------------------------------------------------------------ *How does the search engine handle punctuation and special characters? (and what's configurable?) *Which document formats are supported? + * Guessing from the names of the available parser plugins, this is probably it: + *Plain Text (in a fixed preconfigured charset only) + * HTML (in most any charsets) + * JavaScript (for extracting links only?) + * Microsoft Power Point, the .ppt file + * Microsoft Word, the .doc file + * Adobe PDF + * RSS + * RTF + * MP3 (?) Is there any text in MP3? + * ZIP (?) This seems to expand the zip of plain text files and return the concatenated text. + *What post-coordination options are available? (hey Karen, what does this mean?) *How easy is Nutch to configure? ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs