Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeromeCharron: http://wiki.apache.org/nutch/JeromeCharron ------------------------------------------------------------------------------ == Nutch contributions == * MimeTypeUtil package (org.apache.nutch.util.mime) - * TODO: Provide an content-type mapper (see ParserFactoryImprovementProposal requirements) + * '''TODO''': Provide an content-type mapper (see ParserFactoryImprovementProposal requirements) - * TODO: Replace the current XML descriptor by the [http://freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec#head-0efc2e6be4c23b9a513d7ce0dcff8ed80e8912e7 Freedesktop shared-mime-info-spec] one + * '''TODO''': Replace the current XML descriptor by the [http://freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec#head-0efc2e6be4c23b9a513d7ce0dcff8ed80e8912e7 Freedesktop shared-mime-info-spec] one * LanguageIdentifierPlugin * Some benchs LanguageIdentifierBenchs * Enhance the LanguageParseFilter by checking the validity of the parsed language string. - * TODO: Enhance the LanguageParseFilter by correlating (instead of taking only the first information available) all the clues available : DublinCore / Meta-Http-Equiv / Content-Language and statistical content analysis. + * '''TODO''': Enhance the LanguageParseFilter by correlating (instead of taking only the first information available) all the clues available : DublinCore / Meta-Http-Equiv / Content-Language and statistical content analysis. + * '''TODO''': Improve API by returning an ordered list of candidate languages instead of just one. * MultiLingualSupport proposal. * Framework for a multi-lingual analysis: * Analysis ExtensionPoint @@ -32, +33 @@ * LibLuceneAnalyzersPlugin packaged and committed * AnalysisFrPlugin (Lucene French Analyzer Wrapper) packaged and committed * AnalysisDePlugin (Lucene German Analyzer Wrapper) packaged and committed - * TODO: + * '''TODO''': Multilingual querying support * ParserFactoryImprovementProposal + * '''TODO''': Use content-type/extension-id mapping instead of content-type/plugin-id * PluginRepository enhancements: * Add ability to handle plugins inter-dependencies (ie, a plugin can specify it has a runtime dependency on another(s) plugin(s) using the <requires><import plugin="plugin-id"/></requires> directive in the plugin.xml plugin descriptor. * Add ability to automatically load (depending on config) the required plugins specified by plugins dependencies (circular dependencies checked). * MarkupLanguageParserProposal + * '''TODO''': [http://microformats.org/ Microformats] HtmlParseFilter: + * [http://microformats.org/wiki/rel-tag rel-tag] + * [http://microformats.org/wiki/hreview hreview] + * ... * Nutch [http://fr.wikipedia.org/wiki/Nutch article] on french wikipedia.
