Hi,

I've been using the language identifier plugin, which I think is very
nice. I have a few questions which I hope someone might be able to
answer:

1. Why is the NGramProfile getSimilarity() method not called from
LanguageIdentifier?
2. The javadoc for the identify() method of LanguageIdentifier states
that it returns null if the language is not recognised. However, the
implementation can never return null. Has this ever worked? I think
being able to recognise a "no match" case is an important part of the
API (and would be easy to implement using a threshold value, if the
NGramProfile getSimilarity() method were being used).
3. The identify(InputStream is) method of LanguageIdentifier (in SVN)
assumes that the stream has a UTF-8 encoding, which will obviously
break for other encodings. Would it not be better to use a reader? So
the signature would be:
public String identify(Reader reader) throws IOException
or add a charset argument:
public String identify(InputStream is, String charsetName) throws IOException

Regards,

Tom

Reply via email to