[
https://issues.apache.org/jira/browse/TIKA-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1641:
------------------------------
Attachment: test_ws_tika_tika.txt
test_ws_tika_pdfbox.txt
I'm attaching a straight ExtractText dump from PDFBox's app as well as
tika-app's output with -t.
Y, there are a few more new lines between
{noformat}
asd fasd 12.23.34.45
{noformat}
and
{noformat}
magna aliqua. Ut
{noformat}
And there are more new lines at the end of the file.
Which newlines are causing problems for you, and do they change the meaning of
the document or is this a problem with rendering?
> Extra whitespace produced while extracting bodycontent in tika gui
> ------------------------------------------------------------------
>
> Key: TIKA-1641
> URL: https://issues.apache.org/jira/browse/TIKA-1641
> Project: Tika
> Issue Type: Bug
> Components: gui, handler
> Affects Versions: 1.6
> Reporter: cheehoo
> Attachments: File1.pdf, test_ws_tika_pdfbox.txt,
> test_ws_tika_tika.txt, tika-whitespace.png
>
>
> PDF import into tika gui added extra whitespace/newline in the main content.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)