Yahav Amsalem created TIKA-3257:
-----------------------------------

             Summary: RAR files extracted content is not separated from the 
inner file names
                 Key: TIKA-3257
                 URL: https://issues.apache.org/jira/browse/TIKA-3257
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.23
            Reporter: Yahav Amsalem
         Attachments: test.rar

Attached is a RAR file containing a PPT file ("test.ppt") with one line in it - 
"Here the PPT content starts".

However, the extracted text from tika is *not separating the file name and its 
content* as follows:

"test.pptHere the PPT content starts"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to