Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "LanguageIdentifier" page has been changed by AdamSmith:
http://wiki.apache.org/nutch/LanguageIdentifier?action=diff&rev1=5&rev2=6

  
  == Open Issues ==
  
-  * ''Labs'' tests are quite good (LanguageIdentifierBenchs), but in ''real 
life'', they are not. In fact, in its actual version, the NewLanguageIdentifier 
expects that the provided text to analyze is UTF-8 encoded. However, it is not 
the case for a lot of fetched documents. So, the NewLanguageIdentifier needs to 
refer to a {{{content-encoding}}} meta-data. This data must be provided by a 
(todo) EncodingDetectorPlugin (see 
[[http://issues.apache.org/jira/browse/NUTCH-25|NUTCH-25]] issue).
+  * ''Labs'' tests are quite good (LanguageIdentifierBenchs), but in ''real 
life'', they are not. In fact, in its actual version, the NewLanguageIdentifier 
expects that the provided text to analyze is UTF-8 encoded. However, it is not 
the case for a lot of fetched documents. So, the NewLanguageIdentifier needs to 
refer to a {{{content-encoding}}} meta-data. This data must be provided by a 
(todo) EncodingDetectorPlugin (see 
[[http://issues.apache.org/jira/browse/NUTCH-25|NUTCH-25]] issue). 
[[http://www.mobilemoviesdownload.info | mobile movies]]
  

Reply via email to