[ 
https://issues.apache.org/jira/browse/TIKA-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929726#comment-15929726
 ] 

Sara Miller commented on TIKA-2177:
-----------------------------------


I understand, it is ok for us to leave it as it is, we can solve this in other 
ways from our side. 

Thank you for checking! 


> microsoft.OfficeParser shows add links in additional paragraphs
> ---------------------------------------------------------------
>
>                 Key: TIKA-2177
>                 URL: https://issues.apache.org/jira/browse/TIKA-2177
>             Project: Tika
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 1.13
>         Environment: org.apache.tika.parser.microsoft.OfficeParser and 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser
>            Reporter: Sara Miller
>            Priority: Minor
>
> I'm converting Excel files, both .xls and .xlsx.
> .xls uses org.apache.tika.parser.microsoft.OfficeParser and 
> .xlsx uses org.apache.tika.parser.microsoft.ooxml.OOXMLParser
> If I have a link in my excel document, for example [email protected], the .xls 
> parser adds additional elements in the document structure which shows an 
> incorrect output of how the document looks. 
> For example, this table in file.xls: 
> mailadress    password
> [email protected]       hohoho
> will output: 
>  <div class="page">
>             <h1>Sheet1</h1>
>             <table>
>                 <tbody>
>                     <tr>
>                         <td>mailadress</td>
>                         <td>password</td>
>                     </tr>
>                     <tr>
>                         <td>[email protected]</td>
>                         <td>hohoho</td>
>                     </tr>
>                 </tbody>
>             </table>
>             <div class="outside">
>                 <a href="mailto:[email protected]";>[email protected]</a>
>             </div>
>         </div>
> The <div class="outside"> should be removed because it does not correspond to 
> the document structure. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to