kinow commented on a change in pull request #72: IMAGING-251 support for TIFF floating-point formats URL: https://github.com/apache/commons-imaging/pull/72#discussion_r404591250
########## File path: src/main/java/org/apache/commons/imaging/formats/tiff/datareaders/DataReaderTiled.java ########## @@ -14,6 +14,84 @@ * See the License for the specific language governing permissions and * limitations under the License. */ + /* + * Implementation Notes: + * + * Additional implementation notes are given in DataReaderStrips.java + * + * The TIFF Floating-Point Formats ---------------------------------- + * In addition to providing images, TIFF files can supply data in the + * form of numerical values. As of March 2020 the Commons Imaging library + * was extended to support some floating-point data formats. + * Unfortunately, the floating-point format allows for a lot of different + * variations and only the most widely used of these are currently supported. + * At the time of implementation, only a small set of data products were + * available. Thus it is likely that developers will wish to extend this capability + * as additional test data become available. When implementing extensions + * to this logic, developers are reminder that image processing requires + * access to literally millions of pixels, so attention to performance + * is essential to a successful implementation (please see the notes in + * DataReaderStrips.java for more information). + * The TIFF floating-point implementation is very poorly documented. + * So these notes are included to provide clarification on at least + * some aspects of the format. + * + * The Predictor==3 Case + * TIFF specifies an extension for a predictor that is intended to + * improve data compression ratios for floating-point values. This + * predictor is specified using the TIFF predictor TAG with a value of 3 + * (see TIFF Technical Note 3, April 8, 2005). Consider a 4-byte floating + * point value given in IEEE-754 format. Let f3 be the high-order byte, + * with f2 the next highest, followed by f1, and f0 for the + * low-order byte. This designation shoulod not be confused with the + * in-memory layout of the bytes (little-endian versus big-endian), but + * rather their numerical values. The sign bit and upper 7 bits of the exponent + * are given in the high-order byte, followed by the remaining sign bit + * and the mantissa in the lower. + * In many real-valued raster data sets, the sign and magnitude (exponent) + * of the values changes slowly which the contents of the mantissa vary in + * a semi-random manner, with the information entropy tending to increase + * in the lowest ordered bytes. Thus, the high-order bytes have more + * redundancy than the low-order bytes and can compress more efficiently. + * To exploit this, the TIFF format splits the bytes into groups based on their + * order-of-magnitude. This splitting process takes place on a ROW-BY-ROW + * basis (note the emphasis, this point is not clearly documented in the spec). + * . For example, for row length of 3 pixels -- A, B, and C -- the data + * for two rows would be given as shown below (again, ignoring endian issues): + * Original: + * A3 A2 A1 A0 B3 B2 B1 B0 C3 C2 C1 C0 + * D3 D3 D1 D0 E3 E2 E2 E0 F3 F2 F1 F0 + * + * Bytes split into groups by order-of-magnitude: + * A3 B3 C3 A2 B2 C2 A1 B1 C1 A0 B0 C0 + * D3 E3 F3 D2 E2 F2 D1 E1 F1 D0 E0 F0 + * + * To further improve the compression, the predictor takes the difference of + * each subsequent bytes. Again, the differences (deltas) are computed on + * a row-byte-row basis. For the most part, the differences combine + * bytes associated with the same order-of-magnitude, though there is + * a special transition at the end of each order-of-magnitude set (shown in + * parentheses): + * + * A3, B3-A3, C3-B3, (A2-C3), B2-A2, C2-B2, (A1-C2), etc. + * D3, E3-D3, F3-D3, (D2-F3), E3-D2, etc. + * + * Once the predictor transform is complete, the data is stored using + * conventional data compression techniques such as Deflate or LZW. + * In practice, floating point data does not compress especially well, but + * using the above technique, the TIFF process typically reduces the overall + * storage size by 20 to 30 percent (depending on the data). + * The TIFF Technical Note 3 specifies 3 data size formats for + * storing floating point values: + * 32 bits IEEE-754 single-precision standard + * 16 bits IEEE-754 half-precision standard + * 24 bits A non-standard representation + * At this time, we have not obtained data samples for the smaller + * representations. There are also reports of 64-bit data + * (see Commons Imaging JIRA issue IMAGING-102), though documentation + * for that format was not available when these notes were written. + */ Review comment: This-is-some-great-documentation! I will send a pull request later to format it as Javadoc while I re-read it with more calm :+1: :clap: ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
