[ 
https://issues.apache.org/jira/browse/TIKA-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065835#comment-18065835
 ] 

ASF GitHub Bot commented on TIKA-4690:
--------------------------------------

tballison merged PR #2693:
URL: https://github.com/apache/tika/pull/2693




> Add generative language model in 4.x
> ------------------------------------
>
>                 Key: TIKA-4690
>                 URL: https://issues.apache.org/jira/browse/TIKA-4690
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> Finally realized that we can play all we want with logits from the language 
> detector, but it is not a great approach for "languagey/junk" detection. On 
> this ticket, we'll add a generative model trained on the same languages as 
> the language detector so that we can get a better sense of, for example, 
> "Lang detector said Thai, how likely is it to actually be Thai?"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to