https://bz.apache.org/bugzilla/show_bug.cgi?id=59747
Bug ID: 59747
Summary: xlsx file does not conform to bit patterns used by
common file type detection software
Product: POI
Version: 3.14-FINAL
Hardware: PC
Status: NEW
Severity: normal
Priority: P2
Component: XSSF
Assignee: [email protected]
Reporter: [email protected]
Hi,
I'm creating this bug due to a problem we've encountered with POI generated
xlsx files.
Apparently the order of zip entries in xlsx files is important for tools which
determine the file type be matching a byte pattern. See for example Apache Tika
(without deeper OOXML support library) and linux's file command.
The OOXML spec and Excel have no problem with POI files but tools relying on a
certain pattern have.
Here the output of unzip -l on a POI xlsx file:
Archive: poi.xlsx
Length Date Time Name
-------- ---- ---- ----
591 02.06.16 12:40 _rels/.rels
1063 02.06.16 12:40 [Content_Types].xml
183 02.06.16 12:40 docProps/app.xml
437 02.06.16 12:40 docProps/core.xml
137 02.06.16 12:40 xl/sharedStrings.xml
818 02.06.16 12:40 xl/styles.xml
349 02.06.16 12:40 xl/workbook.xml
569 02.06.16 12:40 xl/_rels/workbook.xml.rels
670 02.06.16 12:40 xl/worksheets/sheet1.xml
-------- -------
4817 9 files
And for a native file:
Archive: excel.xlsx
Length Date Time Name
-------- ---- ---- ----
1032 01.01.80 00:00 [Content_Types].xml
588 01.01.80 00:00 _rels/.rels
557 01.01.80 00:00 xl/_rels/workbook.xml.rels
906 01.01.80 00:00 xl/workbook.xml
1542 01.01.80 00:00 xl/styles.xml
6790 01.01.80 00:00 xl/theme/theme1.xml
1306 01.01.80 00:00 xl/worksheets/sheet1.xml
593 01.01.80 00:00 docProps/core.xml
816 01.01.80 00:00 docProps/app.xml
-------- -------
14130 9 files
According to linux file and Tika they seem to expect [Content_Types].xml as the
first entry, skip the second and look for a "xl/" in the third entry.
Would it be possible to fix the order of the entries?
We've written a simple post processing tool which rewrites the zip file but
would be happy to have this in POI proper.
Thanks and contact me if I can help.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]