Another way of improvement could be to have an index of all the ngrams (in 
all languages).
Each entry in the hastable store a list of pair <Lang, Freq> for each 
language that contains this ngram in its file.
This data structure avoid to loop on each profile and then on each ngram of 
the document to identify,
but only needs to loop on the ngrams of the document to identify.
I think it could greatly improve preformances...
Comments?

Jerome

-- 
http://motrech.free.fr/
http://frutch.free.fr/

Reply via email to