[
https://issues.apache.org/jira/browse/TIKA-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059210#comment-18059210
]
Hudson commented on TIKA-4666:
------------------------------
SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk17 #1212 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/1212/])
TIKA-4666 - add VLM parsers (Claude, Gemini, OpenAI) (#2614) (github:
[https://github.com/apache/tika/commit/2c98c636774ad85d7c1878adc90862ee865dacf0])
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/test/java/org/apache/tika/parser/vlm/GeminiVLMParserTest.java
* (add) docs/modules/ROOT/examples/claude-vlm-full.json
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/main/java/org/apache/tika/parser/vlm/AbstractVLMParser.java
* (add) docs/modules/ROOT/examples/openai-vlm-basic.json
* (add) docs/modules/ROOT/pages/advanced/local-vlm-server.adoc
* (add) docs/modules/ROOT/examples/gemini-vlm-full.json
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/main/java/org/apache/tika/parser/vlm/VLMOCRConfig.java
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/test/java/org/apache/tika/parser/vlm/MarkdownToXHTMLEmitterTest.java
* (edit) tika-parsers/tika-parsers-ml/pom.xml
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/test/java/org/apache/tika/parser/vlm/ClaudeVLMParserTest.java
* (add) docs/modules/ROOT/examples/vlm-pdf-parsing.json
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/main/java/org/apache/tika/parser/vlm/ClaudeVLMParser.java
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/test/java/org/apache/tika/parser/vlm/OpenAIVLMParserTest.java
* (add) docs/modules/ROOT/examples/claude-vlm-basic.json
* (add) docs/modules/ROOT/examples/openai-vlm-full.json
* (add) docs/modules/ROOT/examples/gemini-vlm-basic.json
* (add) tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/pom.xml
* (edit) docs/modules/ROOT/pages/advanced/index.adoc
* (add) docs/modules/ROOT/pages/configuration/parsers/vlm-parsers.adoc
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/main/java/org/apache/tika/parser/vlm/GeminiVLMParser.java
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/main/java/org/apache/tika/parser/vlm/OpenAIVLMParser.java
* (edit) docs/modules/ROOT/nav.adoc
* (add)
tika-parsers/tika-parsers-ml/tika-parser-vlm-ocr-module/src/main/java/org/apache/tika/parser/vlm/MarkdownToXHTMLEmitter.java
> Add VLM/modern OCR options parsers in 4.x
> -----------------------------------------
>
> Key: TIKA-4666
> URL: https://issues.apache.org/jira/browse/TIKA-4666
> Project: Tika
> Issue Type: New Feature
> Reporter: Tim Allison
> Priority: Major
>
> Along the lines of TIKA-4665, it would be great if we could integrate with
> modern vlms. Ideally, these would call out to vlm apis, but we could include
> examples of running qwen or jinaai locally in a python server.
> Again, as with our old deeplearning4j module, I think we should offer
> examples of these kinds of integrations whether or not they get picked up in
> production.
> Modern VLMs can be a swap in for tesseract ocr depending on the prompt. Or,
> they can be used to label images... user chooses what the prompt will be.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)