Vivek created TIKA-3024:
----------------------------
Summary: Extra whitespace appended within a tag element's text
Key: TIKA-3024
URL: https://issues.apache.org/jira/browse/TIKA-3024
Project: Tika
Issue Type: Bug
Affects Versions: 1.20
Reporter: Vivek
Website: [http://www.thevanitycase.com/about-us.php]
While parsing the content of the page using Tika Parser, extra whitespace ("
") is appended in the text "Tel: +91-22-61801700". That is,
Expected text: "<text before this>Tel: +91-22-61801700<text after this>"
Actual text: "<text before this>Tel: +91-22-6180170 0<text after this>"
The JS path of the element: body > div > div:nth-child(6) > div >
div.footer-full.footer-btm > div > p > span
Usually, double whitespace will be appended between every tag element text. But
here double whitespace is appended within a tag element text.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)