[
https://issues.apache.org/jira/browse/TIKA-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4665:
------------------------------
Issue Type: New Feature (was: Task)
> Add chunking and inference handling poc in 4.x
> ----------------------------------------------
>
> Key: TIKA-4665
> URL: https://issues.apache.org/jira/browse/TIKA-4665
> Project: Tika
> Issue Type: New Feature
> Reporter: Tim Allison
> Priority: Major
>
> We should offer basic chunking (based on markdown) and basic integration with
> the openai spec for inference so that we can do all the work and then emit
> the parsed text+metadata+chunks+vectors.
> In some ways, this modernizes the deeplearning4j modules that we no longer
> have in 4.x. Obv, the capability is entirely different, but I think we should
> leave room for these types of PoC integrations. This integration at least
> will be exceedingly light because it relies on external inference services.
> We will not be downloading gigs of model files. :D
--
This message was sent by Atlassian Jira
(v8.20.10#820010)