This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-4662
in repository https://gitbox.apache.org/repos/asf/tika.git


    from b08ba012cf Merge branch 'main' into TIKA-4662
     add 3efb1c3945 TIKA-4662 -- clean up and rat

No new revisions were added by this update.

Summary of changes:
 .../advanced/charsoup-supported-languages.adoc     |  17 +
 .../advanced/lang-detection/flores-AUTOMATIC.log   |  15 +
 .../advanced/lang-detection/flores-SHORT_TEXT.log  |  15 +
 .../advanced/lang-detection/flores-STANDARD.log    |  15 +
 .../advanced/lang-detection/flores200-dev-eval.md  |  17 +
 .../lang-detection/language-drop-decisions.md      |  17 +
 .../short-text-language-decisions.md               |  17 +
 .../advanced/lang-detection/supported-languages.md |  17 +
 .../tika/langdetect/charsoup/confusables.txt       |  15 +
 .../src/test/python/filter_uppercase.py            |  15 +
 tika-ml/tika-ml-chardetect/pom.xml                 |  85 ++++-
 .../ml/chardetect/ByteNgramFeatureExtractor.java   | 121 -------
 .../tika/ml/chardetect/CharsetConfusables.java     | 309 -----------------
 .../chardetect/tools/BuildCharsetTrainingData.java |  21 +-
 .../chardetect/ByteNgramFeatureExtractorTest.java  |  82 +++--
 .../ml/chardetect/tools/TrainCharsetModel.java     | 364 ---------------------
 ...2273-encoding-detector-outside-static-init.json |   2 +-
 .../TIKA-2273-no-icu4j-encoding-detector.json      |   2 +-
 .../tika/server/core/LanguageResourceTest.java     |  10 +-
 .../src/test/resources/test-documents/english.txt  |   2 +-
 20 files changed, 294 insertions(+), 864 deletions(-)
 delete mode 100644 
tika-ml/tika-ml-chardetect/src/main/java/org/apache/tika/ml/chardetect/ByteNgramFeatureExtractor.java
 delete mode 100644 
tika-ml/tika-ml-chardetect/src/main/java/org/apache/tika/ml/chardetect/CharsetConfusables.java
 delete mode 100644 
tika-ml/tika-ml-chardetect/src/test/java/org/apache/tika/ml/chardetect/tools/TrainCharsetModel.java

Reply via email to