Hi, I've been using the language identifier plugin, which I think is very nice. I have a few questions which I hope someone might be able to answer:
1. Why is the NGramProfile getSimilarity() method not called from LanguageIdentifier? 2. The javadoc for the identify() method of LanguageIdentifier states that it returns null if the language is not recognised. However, the implementation can never return null. Has this ever worked? I think being able to recognise a "no match" case is an important part of the API (and would be easy to implement using a threshold value, if the NGramProfile getSimilarity() method were being used). 3. The identify(InputStream is) method of LanguageIdentifier (in SVN) assumes that the stream has a UTF-8 encoding, which will obviously break for other encodings. Would it not be better to use a reader? So the signature would be: public String identify(Reader reader) throws IOException or add a charset argument: public String identify(InputStream is, String charsetName) throws IOException Regards, Tom
