[
https://issues.apache.org/jira/browse/TIKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601694#comment-16601694
]
ASF GitHub Bot commented on TIKA-2720:
--------------------------------------
ThejanW commented on issue #248: Fix for TIKA-2720 [WIP]
URL: https://github.com/apache/tika/pull/248#issuecomment-417965702
The sentences in the above comment are parsed through the encoder, and it
outputs an array containing 512 floats each and every sentence. Once I have
that, I calculates the cosine similarity between each and every array I get for
sentences and here are the highest matched sentence couples with their cosine
similarities.
At each segment, you will find the two sentences and then the cosine
similarity. For an example in the first segment, we have the sentences, "How
old are you?" and "What is your age?" having a cosine similarity of
0.8516871929168701, which is the highest, the list goes on...
```
How old are you?
What is your age?
0.8516871929168701
How old are you?
How old did you turn?
0.7483202219009399
What is your age?
How old did you turn?
0.6784225106239319
Heavy rain slammed the mid-Atlantic United States on Monday, delaying
flights, forming sinkholes
Recently a lot of hurricanes have hit the US
0.6395097374916077
The Samsung Galaxy S10 has the potential to be the most exciting phone of
2019
Android beats iOS in smartphone loyalty, study finds
0.6229119300842285
Heavy rain slammed the mid-Atlantic United States on Monday, delaying
flights, forming sinkholes
News showed, violent floodwaters surging down main Streets
0.6069092154502869
How old are you?
When is your birthday?
0.5812650322914124
What is your age?
When is your birthday?
0.5723845362663269
Android beats iOS in smartphone loyalty, study finds
Apple became the world’s first trillion-dollar public company
0.5713004469871521
Green tea contains bioactive compounds that improve health
Is paleo better than keto?
0.5498321652412415
News showed, violent floodwaters surging down main Streets
Recently a lot of hurricanes have hit the US
0.534430205821991
The Samsung Galaxy S10 has the potential to be the most exciting phone of
2019
IPhone X includes a 5.8-inch edge-to-edge display which covers the entire
front of the phone.
0.5117762088775635
Heavy rain slammed the mid-Atlantic United States on Monday, delaying
flights, forming sinkholes
Multiple lines of scientific evidence show that the climate system is warming
0.5018186569213867
Android beats iOS in smartphone loyalty, study finds
IPhone X includes a 5.8-inch edge-to-edge display which covers the entire
front of the phone.
0.4970431923866272
Green tea contains bioactive compounds that improve health
Yoga has been shown to help people reduce anxiety
0.4776824116706848
How old did you turn?
When is your birthday?
0.46567195653915405
The Samsung Galaxy S10 has the potential to be the most exciting phone of
2019
Apple became the world’s first trillion-dollar public company
0.4522799849510193
Recently a lot of hurricanes have hit the US
Multiple lines of scientific evidence show that the climate system is warming
0.4517837166786194
With roads covered with slippery snow and ice, can challenge even the most
experienced driver.
Heavy rain slammed the mid-Atlantic United States on Monday, delaying
flights, forming sinkholes
0.42890870571136475
An ounce of prevention is worth a pound of cure
Green tea contains bioactive compounds that improve health
0.38761529326438904
An ounce of prevention is worth a pound of cure
Yoga has been shown to help people reduce anxiety
0.38396507501602173
News showed, violent floodwaters surging down main Streets
Multiple lines of scientific evidence show that the climate system is warming
0.3623693287372589
IPhone X includes a 5.8-inch edge-to-edge display which covers the entire
front of the phone.
Apple became the world’s first trillion-dollar public company
0.361715167760849
With roads covered with slippery snow and ice, can challenge even the most
experienced driver.
News showed, violent floodwaters surging down main Streets
0.35203033685684204
Yoga has been shown to help people reduce anxiety
Is paleo better than keto?
0.34740278124809265
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> A parser to output universal sentence encodings to text
> -------------------------------------------------------
>
> Key: TIKA-2720
> URL: https://issues.apache.org/jira/browse/TIKA-2720
> Project: Tika
> Issue Type: New Feature
> Components: tika-dl
> Reporter: Thejan Wijesinghe
> Priority: Major
> Fix For: 2.0
>
>
> This parser encodes a text into high dimensional vectors that can be used for
> text classification, semantic similarity, clustering and other natural
> language tasks. The model is trained and optimized for greater-than-word
> length text, such as sentences, phrases or short paragraphs. It is trained on
> a variety of data sources and a variety of tasks with the aim of dynamically
> accommodating a wide variety of natural language understanding tasks. The
> input is variable length English text and the output is a 512 dimensional
> vector.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)