Hi all,

As part of integrating language-detector into Tika (see TIKA-1723), I noticed 
TIKA-546 ("Add ability to create language profiles to tika-app")

If we switch over to language-detector, then this code no longer makes sense.

Also note that many language detectors require the full set of language data in 
order to generate the most relevant (discriminating) ngrams, thus the current 
support for passing in data for one language doesn't work.

So any suggestions for what to do? Leave the code as is, with deprecated 
annotations, even though the profiles generated won't be useful?

Or wait for pluggable detectors, and someone could port the current Tika code - 
then this profile building support might still make sense, though it would want 
to be moved into the specific plugin.

-- Ken


Reply via email to