[ 
https://issues.apache.org/jira/browse/TIKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601688#comment-16601688
 ] 

ASF GitHub Bot commented on TIKA-2720:
--------------------------------------

ThejanW commented on issue #248: Fix for TIKA-2720 [WIP]
URL: https://github.com/apache/tika/pull/248#issuecomment-417964712
 
 
   **The set of test sentences are as follows, (consider the "About.." part as 
the topic of the particular sentence group)**
   
   About age
   > How old are you?
   > What is your age?
   > How old did you turn?
   > When is your birthday?
   
   About smart phones
   > The Samsung Galaxy S10 has the potential to be the most exciting phone of 
2019
   > Android beats iOS in smartphone loyalty, study finds
   > IPhone X includes a 5.8-inch edge-to-edge display which covers the entire 
front of the phone.
   > Apple became the world’s first trillion-dollar public company
   
   About weather
   > With roads covered with slippery snow and ice, can challenge even the most 
experienced driver.
   > Heavy rain slammed the mid-Atlantic United States on Monday, delaying 
flights, forming sinkholes
   > News showed, violent floodwaters surging down main Streets
   > Recently a lot of hurricanes have hit the US
   > Multiple lines of scientific evidence show that the climate system is 
warming
   
   About health
   > An ounce of prevention is worth a pound of cure
   > Green tea contains bioactive compounds that improve health
   > Yoga has been shown to help people reduce anxiety
   > Is paleo better than keto?
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> A parser to output universal sentence encodings to text
> -------------------------------------------------------
>
>                 Key: TIKA-2720
>                 URL: https://issues.apache.org/jira/browse/TIKA-2720
>             Project: Tika
>          Issue Type: New Feature
>          Components: tika-dl
>            Reporter: Thejan Wijesinghe
>            Priority: Major
>             Fix For: 2.0
>
>
> This parser encodes a text into high dimensional vectors that can be used for 
> text classification, semantic similarity, clustering and other natural 
> language tasks. The model is trained and optimized for greater-than-word 
> length text, such as sentences, phrases or short paragraphs. It is trained on 
> a variety of data sources and a variety of tasks with the aim of dynamically 
> accommodating a wide variety of natural language understanding tasks. The 
> input is variable length English text and the output is a 512 dimensional 
> vector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to