[ 
https://issues.apache.org/jira/browse/TIKA-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059477#comment-18059477
 ] 

ASF GitHub Bot commented on TIKA-4665:
--------------------------------------

tballison merged PR #2613:
URL: https://github.com/apache/tika/pull/2613




> Add chunking and inference handling poc in 4.x
> ----------------------------------------------
>
>                 Key: TIKA-4665
>                 URL: https://issues.apache.org/jira/browse/TIKA-4665
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Tim Allison
>            Priority: Major
>
> We should offer basic chunking (based on markdown) and basic integration with 
> the openai spec for inference so that we can do all the work and then emit 
> the parsed text+metadata+chunks+vectors.
> In some ways, this modernizes the deeplearning4j modules that we no longer 
> have in 4.x. Obv, the capability is entirely different, but I think we should 
> leave room for these types of PoC integrations. This integration at least 
> will be exceedingly light because it relies on external inference services. 
> We will not be downloading gigs of model files. :D



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to