[ https://issues.apache.org/jira/browse/TIKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-2720: ------------------------------ Fix Version/s: (was: 2.0.0) 2.0.0-BETA > A parser to output universal sentence encodings to text > ------------------------------------------------------- > > Key: TIKA-2720 > URL: https://issues.apache.org/jira/browse/TIKA-2720 > Project: Tika > Issue Type: New Feature > Components: tika-dl > Reporter: Thejan Wijesinghe > Priority: Major > Fix For: 2.0.0-BETA > > > This parser encodes a text into high dimensional vectors that can be used for > text classification, semantic similarity, clustering and other natural > language tasks. The model is trained and optimized for greater-than-word > length text, such as sentences, phrases or short paragraphs. It is trained on > a variety of data sources and a variety of tasks with the aim of dynamically > accommodating a wide variety of natural language understanding tasks. The > input is variable length English text and the output is a 512 dimensional > vector. -- This message was sent by Atlassian Jira (v8.3.4#803005)