[ 
https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489731#comment-16489731
 ] 

ASF GitHub Bot commented on TIKA-2520:
--------------------------------------

chrismattmann commented on issue #237: TIKA-2520 optimize OptimaizeLangDetector 
default loadModel()
URL: https://github.com/apache/tika/pull/237#issuecomment-391847033
 
 
   So my problem was Tesseract installed on MacOS X (thanks to @dameikle for 
pointing this out on list). I turned off Tesseract and then built again and 
this patch / PR integrated fine:
   
   ```
   [INFO] Scanned 2 class file(s) for forbidden API invocations (in 0.04s), 0 
error(s).
   [INFO] 
   [INFO] --- forbiddenapis:2.5:testCheck (default) @ tika-nlp ---
   [INFO] Scanning for classes to check...
   [INFO] Reading bundled API signatures: jdk-unsafe-1.7
   [INFO] Reading bundled API signatures: jdk-deprecated-1.7
   [INFO] Reading bundled API signatures: jdk-non-portable
   [INFO] Reading bundled API signatures: jdk-internal-1.7
   [INFO] Reading bundled API signatures: commons-io-unsafe-2.6
   [INFO] Loading classes to check...
   [INFO] Scanning classes for violations...
   [INFO] Scanned 1 class file(s) for forbidden API invocations (in 0.09s), 0 
error(s).
   [INFO] 
   [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ tika-nlp 
---
   [INFO] Installing 
/Users/mattmann/tmp/tika2.0.0/tika-nlp/target/tika-nlp-1.19-SNAPSHOT.jar to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-nlp/1.19-SNAPSHOT/tika-nlp-1.19-SNAPSHOT.jar
   [INFO] Installing /Users/mattmann/tmp/tika2.0.0/tika-nlp/pom.xml to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-nlp/1.19-SNAPSHOT/tika-nlp-1.19-SNAPSHOT.pom
   [INFO] Installing 
/Users/mattmann/tmp/tika2.0.0/tika-nlp/target/tika-nlp-1.19-SNAPSHOT-jar-with-dependencies.jar
 to 
/Users/mattmann/.m2/repository/org/apache/tika/tika-nlp/1.19-SNAPSHOT/tika-nlp-1.19-SNAPSHOT-jar-with-dependencies.jar
   [INFO] 
   [INFO] 
------------------------------------------------------------------------
   [INFO] Building Apache Tika 1.19-SNAPSHOT
   [INFO] 
------------------------------------------------------------------------
   [INFO] 
   [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ tika ---
   [INFO] 
   [INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (enforce) @ tika ---
   [INFO] 
   [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ tika ---
   [INFO] 
   [INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ 
tika ---
   [INFO] 
   [INFO] --- forbiddenapis:2.5:check (default) @ tika ---
   [INFO] Skipping execution for packaging "pom"
   [INFO] 
   [INFO] --- forbiddenapis:2.5:testCheck (default) @ tika ---
   [INFO] Skipping execution for packaging "pom"
   [INFO] 
   [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ tika ---
   [INFO] Installing /Users/mattmann/tmp/tika2.0.0/pom.xml to 
/Users/mattmann/.m2/repository/org/apache/tika/tika/1.19-SNAPSHOT/tika-1.19-SNAPSHOT.pom
   [INFO] 
------------------------------------------------------------------------
   [INFO] Reactor Summary:
   [INFO] 
   [INFO] Apache Tika parent ................................. SUCCESS [  1.515 
s]
   [INFO] Apache Tika core ................................... SUCCESS [ 28.678 
s]
   [INFO] Apache Tika parsers ................................ SUCCESS [03:50 
min]
   [INFO] Apache Tika XMP .................................... SUCCESS [  2.401 
s]
   [INFO] Apache Tika serialization .......................... SUCCESS [  1.925 
s]
   [INFO] Apache Tika batch .................................. SUCCESS [01:55 
min]
   [INFO] Apache Tika language detection ..................... SUCCESS [  2.867 
s]
   [INFO] Apache Tika application ............................ SUCCESS [01:08 
min]
   [INFO] Apache Tika OSGi bundle ............................ SUCCESS [ 39.266 
s]
   [INFO] Apache Tika translate .............................. SUCCESS [  7.387 
s]
   [INFO] Apache Tika server ................................. SUCCESS [ 28.187 
s]
   [INFO] Apache Tika examples ............................... SUCCESS [ 11.841 
s]
   [INFO] Apache Tika Java-7 Components ...................... SUCCESS [  2.566 
s]
   [INFO] Apache Tika eval ................................... SUCCESS [ 30.226 
s]
   [INFO] Apache Tika Deep Learning (powered by DL4J) ........ SUCCESS [01:02 
min]
   [INFO] Apache Tika Natural Language Processing ............ SUCCESS [ 23.399 
s]
   [INFO] Apache Tika ........................................ SUCCESS [  0.025 
s]
   [INFO] 
------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] 
------------------------------------------------------------------------
   [INFO] Total time: 10:59 min
   [INFO] Finished at: 2018-05-24T13:19:36-07:00
   [INFO] Final Memory: 186M/1733M
   [INFO] 
------------------------------------------------------------------------
   ```
   
   However @kkrugler point is well taken. We should address that first before 
merging. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OptimaizeLangDetector#loadModels() should not be called for every single 
> langdetect HTTP request
> ------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2520
>                 URL: https://issues.apache.org/jira/browse/TIKA-2520
>             Project: Tika
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 1.16
>            Reporter: Vincent van Donselaar
>            Priority: Minor
>              Labels: performance
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Tika REST server's `/language` resource invokes the relatively heavy 
> `loadModels` operation for every language detect call:
> {code:title=LanguageResource.java}
> public String detect(final String string) throws IOException {
>       LanguageResult language = new 
> OptimaizeLangDetector().loadModels().detect(string);
>       String detectedLang = language.getLanguage();
>       LOG.info("Detecting language for incoming resource: [{}]", 
> detectedLang);
>       return detectedLang;
> }
> {code}
> This could be optimized by (lazy?) loading the models only once and keep them 
> in memory. I assume the `LanguageDetector` is not thread safe, so I expect 
> this requires an ExecutorService with language detectors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to