Thejan Wijesinghe created TIKA-2720:
---------------------------------------

             Summary: A parser to output universal sentence encodings to text
                 Key: TIKA-2720
                 URL: https://issues.apache.org/jira/browse/TIKA-2720
             Project: Tika
          Issue Type: New Feature
          Components: tika-dl
            Reporter: Thejan Wijesinghe
             Fix For: 2.0


This parser encodes a text into high dimensional vectors that can be used for 
text classification, semantic similarity, clustering and other natural language 
tasks. The model is trained and optimized for greater-than-word length text, 
such as sentences, phrases or short paragraphs. It is trained on a variety of 
data sources and a variety of tasks with the aim of dynamically accommodating a 
wide variety of natural language understanding tasks. The input is variable 
length English text and the output is a 512 dimensional vector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to