[
https://issues.apache.org/jira/browse/TIKA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler closed TIKA-532.
----------------------------
Resolution: Duplicate
As per link, this is a duplicate of [TIKA-394].
> missing spaces in text extraction of BodyContentHandler
> -------------------------------------------------------
>
> Key: TIKA-532
> URL: https://issues.apache.org/jira/browse/TIKA-532
> Project: Tika
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Reinhard Schwab
> Fix For: 0.8
>
>
> BodyContentHandler works fine to extract the text from pages,
> except this page:
> http://www.lucidimagination.com/developers/whitepapers/whats-new-solr-14
> there is a selection,
> the text returned by BodyContentHandler contains
> "...Country: *
> -- Select a Country -- United
> StatesCanadaArgentinaAustraliaBrazilChinaFranceGermanyIndiaIndonesiaItalyJapanMexicoRussiaSaudi"
> to have a space between the country names would be favourable.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.