[ 
https://issues.apache.org/jira/browse/TIKA-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1641:
------------------------------
    Attachment: test_ws_tika_tika.txt
                test_ws_tika_pdfbox.txt

I'm attaching a straight ExtractText dump from PDFBox's app as well as 
tika-app's output with -t.

Y, there are a few more new lines between
{noformat}
asd fasd   12.23.34.45 
{noformat}

and
{noformat}
magna aliqua. Ut 
{noformat}

And there are more new lines at the end of the file.

Which newlines are causing problems for you, and do they change the meaning of 
the document or is this a problem with rendering?

> Extra whitespace produced while extracting bodycontent in tika gui
> ------------------------------------------------------------------
>
>                 Key: TIKA-1641
>                 URL: https://issues.apache.org/jira/browse/TIKA-1641
>             Project: Tika
>          Issue Type: Bug
>          Components: gui, handler
>    Affects Versions: 1.6
>            Reporter: cheehoo
>         Attachments: File1.pdf, test_ws_tika_pdfbox.txt, 
> test_ws_tika_tika.txt, tika-whitespace.png
>
>
> PDF import into tika gui added extra whitespace/newline in the main content.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to