Laura >Hi all, > >I'm using Jobo for spidering web sites and lucene for indexing. The >problem is that I'd like spidering only Italian web sites. >How can I see discover the country of a web site? > >Dou you know some method that tou can suggest me?
The best method I know is using n-grams of characters and use the frequencies of the n-grams that occur most: http://citeseer.nj.nec.com/context/698873/68861 Regards, Ype -- -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
