On dinsdag 8 september 2020 13:06:19 CEST Christian Grün wrote: > > Here is an example that creates a new archive that uses > > compression-level="0" and algorithm="stored" and still compresses that > > entry. > > > > Note that the archive level option 'algorithm' is unfortumate because > > often it is only single entries such as 'mimetype' or images that should > > not be compressed. > > Thanks for the example. – My observation is that the entry is indeed > archived uncompressed if you choose compression-level="0"; but I think > what you are saying is that an uncompressed DEFLATE entry is not the > same as an uncompressed STORED entry, right, and that ODS and ePub > files require certain files to be stored with the STORED algorithm, is > that right?
The thing that counts is that you can read the mimetype enty name and contents without decompression starting from byte 30. That way tools such as 'find' can report the mimetype. The file generated with the attached script in BaseX 9.4.3 beta gives this: $ file -i test.epub test.epub: application/octet-stream; charset=binary $ unzip -vl test.epub Archive: test.epub Length Method Size Cmpr Date Time CRC-32 Name -------- ------ ------- ---- ---------- ----- -------- ---- 20 Defl:N 25 -25% 09-08-2020 13:54 2cab616f mimetype -------- ------- --- ------- 20 25 -25% 1 file $ hexdump -C test.epub | head -4 00000000 50 4b 03 04 14 00 08 08 08 00 d9 6e 28 51 00 00 |PK.........n(Q..| 00000010 00 00 00 00 00 00 00 00 00 00 08 00 00 00 6d 69 |..............mi| 00000020 6d 65 74 79 70 65 01 14 00 eb ff 61 70 70 6c 69 |metype.....appli| 00000030 63 61 74 69 6f 6e 2f 65 70 75 62 2b 7a 69 70 50 |cation/epub+zipP| There are 5 bytes between 'mimetype' and 'applicatino/epub+zip'. These are deflate information. If the entry is 'stored' there are no bytes between the entry name and the contents and the zip will be recognized by the epub and ODF applications (and use less space) than when it is deflated with compression- level 0. > The Archive Module has a long history, and was initially based on a > proposal for the Zorba XQuery Processor back in 2012. I don’t actually > remember why the algorithm option was not adopted for the single > archive entries; maybe that would have been more reasonable. As we > seem to be the only implementation left today, we could think about > changing that. I doubt anyway that people will use different > compression levels for single archive entries (apart from archiving > them uncompressed), so it might be a better solution to define one > global compression level for the whole archive. From a practical point of view (regardless of what is in the specification) it makes sense to store 'mimetype' uncompressed and also store files such as png and jpg that are already compressed in the 'stored' way. If that can be achieved easily: great, but at least it should be possible. I think the simplest solution is to save compression-level=0 as stored. Best regards, Jos
signature.asc
Description: This is a digitally signed message part.