Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeromeCharron: http://wiki.apache.org/nutch/JeromeCharron ------------------------------------------------------------------------------ * Some benchs LanguageIdentifierBenchs * Enhance the LanguageParseFilter by checking the validity of the parsed language string. * '''TODO''': Enhance the LanguageParseFilter by correlating (instead of taking only the first information available) all the clues available : DublinCore / Meta-Http-Equiv / Content-Language and statistical content analysis. + * '''TODO''': Improve API : - * '''TODO''': Improve API by returning an ordered list of candidate languages instead of just one. + * returns an ordered list of candidate languages instead of just one. + * See also Andrzej [http://www.nabble.com/Re%3A-lang-identifier-and-nutch-analyzer-in-trunk-p2533535.html comments] : + * exporting a list of supported languages, + * exporting an NGramProfile of the analyzed text, + * allow processing of chunks of input. * MultiLingualSupport proposal. * Framework for a multi-lingual analysis: * Analysis ExtensionPoint
