[
https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158223#comment-14158223
]
Tim Allison edited comment on TIKA-1427 at 10/3/14 5:33 PM:
------------------------------------------------------------
On at least one test doc, I'm getting correct behavior:
{noformat}
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="meta:creation-date" content="2002-12-31T14:13:29Z" />
...
</head>
<body><div class="page"><p />
<p> </p>
<p>What is a generic drug?</p>
...
<p>generic drugs. </p>
<p />
<img src="embedded:image0.png" alt="image0.png" /><img
src="embedded:image1.png" alt="image1.png" /><img src="embedded:image2.png"
alt="image2.png" /></div>
<div class="page"><p />
...
<p>Generic Drugs: Safe. Effective. FDA Approved.</p>
<p />
<img src="embedded:image3.png" alt="image3.png" /><img
src="embedded:image4.png" alt="image4.png" /></div>
<ul> <li>Local Disk</li>
<ul> <li>Generic Drugs</li>
</ul>
</ul>
</body></html>
{noformat}
Can you attach an example of a file that is failing?
was (Author: [email protected]):
On at least one test doc, I'm getting correct behavior:
{noformat}
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="meta:creation-date" content="2002-12-31T14:13:29Z" />
...
</head>
<body><div class="page"><p />
<p> </p>
<p>What is a generic drug?</p>
...
<p>generic drugs. </p>
<p />
<img src="embedded:image0.png" alt="image0.png" /><img
src="embedded:image1.png" alt="image1.png" /><img src="embedded:image2.png"
alt="image2.png" /></div>
<div class="page"><p />
...
<p>Generic Drugs: Safe. Effective. FDA Approved.</p>
<p />
<img src="embedded:image3.png" alt="image3.png" /><img
src="embedded:image4.png" alt="image4.png" /></div>
<ul> <li>Local Disk</li>
<ul> <li>Generic Drugs</li>
</ul>
</ul>
</body></html>
<noformat>
Can you attach an example of a file that is failing?
> PDF Images don't appear in structured view
> ------------------------------------------
>
> Key: TIKA-1427
> URL: https://issues.apache.org/jira/browse/TIKA-1427
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.6
> Reporter: James Baker
> Assignee: Tim Allison
> Labels: pdf
>
> When viewing, say, a Word Document, any images appear in the 'structured
> view' of the document as <img> tags. The same is not true of PDF documents,
> and we lose both the fact that there is an image present, and where it is in
> the document.
> Some discussion of this issue in the comments of TIKA-1396.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)