[
https://issues.apache.org/jira/browse/TIKA-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276390#comment-13276390
]
Nick Burch commented on TIKA-922:
---------------------------------
Tika is returning the values/text stored in the file itself, and is not doing
any interpretation on them. If iWorks stores 90% as 0.9 (or as close to that as
floating point allows), then that's what we'll return
For the Excel formats, something very similar gets stored in the files too.
However, for the Excel formats, we have a full library (Apache POI) around it
to handle formatting
As there's no such library for iWorks at the moment, I wonder how close the
iWorks formatting rules are to Excel ones? If they're close enough, then we
might be able to re-use some of the formatting support in POI
> iWork number cell formats which are being modified in parsing
> -------------------------------------------------------------
>
> Key: TIKA-922
> URL: https://issues.apache.org/jira/browse/TIKA-922
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Labels: iwork
>
> iWork Number cell formats which Tika parser is parsing but in a modified form.
> Percentage turns into a decimal. ie 90% becomes .9000000002
> Accounting appends a $, but the $ is missing from parsed data
> Fraction is turned into a decimal
> Number System (ie Binary) translated to decimal. Ie '11001000' becomes '200'
> Scientific Numbers translated to decimal. ie 9.0000E-03 becomes 9000
> Drop down menu parses all the menu items, but not what's selected.
> Currency & Number aren't displayed properly ie. $0.60 becomes .59999
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira