Detection of Language during crawling

Raj Chidara Mon, 11 Dec 2023 09:43:26 -0800

Hi

 I am reading data from files stored in segments and storing it into my own 
database.   I am able to get content, content-type, etc meta data of URLs.  
However, I am not knowing, "how to get language of content?".




  I passed content to tika language identifier and got the language by using 
LanguageIdentifier method.  However, I do not want to execute this extra line 
as nutch is already detecting language during crawling and parsing of content 
of URL.





Thanks and Regards

Raj Chidara


 
 
 
Worldwide Offices:

USA | UK | India | Singapore | Japan

*ISO 9001, 27001, 20000 Compliant



www.DDIsmart.com


 
 
 

 
 
 
 
 
 
 
DISCLAIMER: This message is intended solely for the use of the individual or 
entity to which it is addressed. If you are not the intended recipient, you 
should not use, copy, alter, or disclose the contents of this message. All 
information or opinions expressed in this message and/or any attachments are 
those of the author and are not necessarily those of the group companies.

Detection of Language during crawling

Reply via email to