Thejan Wijesinghe created TIKA-2720:
---------------------------------------
Summary: A parser to output universal sentence encodings to text
Key: TIKA-2720
URL: https://issues.apache.org/jira/browse/TIKA-2720
Project: Tika
Issue Type: New Feature
Components: tika-dl
Reporter: Thejan Wijesinghe
Fix For: 2.0
This parser encodes a text into high dimensional vectors that can be used for
text classification, semantic similarity, clustering and other natural language
tasks. The model is trained and optimized for greater-than-word length text,
such as sentences, phrases or short paragraphs. It is trained on a variety of
data sources and a variety of tasks with the aim of dynamically accommodating a
wide variety of natural language understanding tasks. The input is variable
length English text and the output is a 512 dimensional vector.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)