[
https://issues.apache.org/jira/browse/IMAGING-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gary Lucas updated IMAGING-259:
-------------------------------
Description:
TIFF files support many different formats, some of them legacy or specialty
formats, others that are widely used. DataReaderStrips and DataReaderTiled were
originally written with a single block of code that collected the raw data
(samples) for each pixel and then passed it into a single method that branched
depending on the format. This approach meant that for each pixel, the reader
loops had the extra overhead of a method call that executed multiple
conditional evaluations. In 2012, enhancements were added to imaging to execute
dedicated blocks of code for a few commonly used formats, most notably 3-byte
RGB. However, at this time, the code does not support the case where the RGB
is stored with a differencing predictor. Predictors improve the compression
ratios (often significantly) when compressing RGB images. So I propose to
enhance the dedicated RGB code to support predictors.
Here's an example of some performance testing on a large image that uses
compression with imaging. The time to load images was extracted using the
Processing file: CONUS_LandWaterMask_LZW_RGB.tif (original)
image size: 6000 by 4000
{noformat}
Processing file: CONUS_LandWaterMask_LZW_RGB.tif (original)
image size: 6000 by 4000
time to load image -- memory
time ms avg ms -- used mb total mb
971.817 0.000 -- 213.592 252.000
921.690 0.000 -- 143.229 260.000
895.587 895.587 -- 96.234 174.000
899.227 897.407 -- 117.259 154.000
899.078 897.964 -- 134.200 184.000
889.602 895.873 -- 143.226 180.000
896.170 895.933 -- 128.183 188.000
894.250 895.652 -- 97.187 178.000
896.436 895.764 -- 103.226 186.000
891.540 895.236 -- 119.185 171.000
Processing file: CONUS_LandWaterMask_LZW_RGB.tif (with changes)
image size: 6000 by 4000
time to load image -- memory
time ms avg ms -- used mb total mb
498.123 0.000 -- 212.589 252.000
423.136 0.000 -- 110.733 237.000
396.021 396.021 -- 100.735 164.000
400.435 398.228 -- 115.725 160.000
400.901 399.119 -- 114.726 162.000
395.092 398.112 -- 118.711 159.000
394.106 397.311 -- 118.710 159.000
400.866 397.903 -- 118.710 159.000
400.972 398.342 -- 115.710 160.000
397.218 398.201 -- 109.691 164.000
{noformat}
Additionally, the special-purpose RGB block of code included additional logic
to support a case for non-RGB formats where image samples were organized 3 one
byte samples, but the photometric interpretation was not RGB. According to
Coveralls, this block of code is not exercised by any of our test images. Thus
that part of the code is uncovered by testing. So I will be removing it to
improve the code-coverage scores. I believe that this change is appropriate
because, even if there are TIFF files "in the wild" that use this
configuration, the commons imaging library will still work properly. In such a
case, the image samples would be handled properly by the original,
non-specialized block of code. Furthermore, I went through the TIFF
specification and did not see any obvious examples of a case where that
configuration would be likely.
was:
TIFF files support many different formats, some of them legacy or specialty
formats, others that are widely used. DataReaderStrips and DataReaderTiled were
originally written with a single block of code that collected the raw data
(samples) for each pixel and then passed it into a single method that branched
depending on the format. This approach meant that for each pixel, the reader
loops had the extra overhead of a method call that executed multiple
conditional evaluations. In 2012, enhancements were added to imaging to execute
dedicated blocks of code for a few commonly used formats, most notably 3-byte
RGB. However, at this time, the code does not support the case where the RGB
is stored with a differencing predictor. Predictors improve the compression
ratios (often significantly) when compressing RGB images. So I propose to
enhance the dedicated RGB code to support predictors.
Here's an example of some performance testing on a large image that uses
compression with imaging
Processing file: CONUS_LandWaterMask_LZW_RGB.tif (original)
image size: 6000 by 4000
time to load image -- memory
time ms avg ms -- used mb total mb
971.817 0.000 -- 213.592 252.000
921.690 0.000 -- 143.229 260.000
895.587 895.587 -- 96.234 174.000
899.227 897.407 -- 117.259 154.000
899.078 897.964 -- 134.200 184.000
889.602 895.873 -- 143.226 180.000
896.170 895.933 -- 128.183 188.000
894.250 895.652 -- 97.187 178.000
896.436 895.764 -- 103.226 186.000
891.540 895.236 -- 119.185 171.000
Processing file: CONUS_LandWaterMask_LZW_RGB.tif (with chamges)
image size: 6000 by 4000
time to load image -- memory
time ms avg ms -- used mb total mb
498.123 0.000 -- 212.589 252.000
423.136 0.000 -- 110.733 237.000
396.021 396.021 -- 100.735 164.000
400.435 398.228 -- 115.725 160.000
400.901 399.119 -- 114.726 162.000
395.092 398.112 -- 118.711 159.000
394.106 397.311 -- 118.710 159.000
400.866 397.903 -- 118.710 159.000
400.972 398.342 -- 115.710 160.000
397.218 398.201 -- 109.691 164.000
Additionally, the special-purpose RGB block of code included additional logic
to support a case for non-RGB formats where image samples were organized 3 one
byte samples, but the photometric interpretation was not RGB. According to
Coveralls, this block of code is not exercised by any of our test images. Thus
that part of the code is uncovered by testing. So I will be removing it to
improve the code-coverage scores. I believe that this change is appropriate
because, even if there are TIFF files "in the wild" that use this
configuration, the commons imaging library will still work properly. In such a
case, the image samples would be handled properly by the original,
non-specialized block of code. Furthermore, I went through the TIFF
specification and did not see any obvious examples of a case where that
configuration would be likely.
> Enhance TIFF DataReaders speed for compressed RGB
> -------------------------------------------------
>
> Key: IMAGING-259
> URL: https://issues.apache.org/jira/browse/IMAGING-259
> Project: Commons Imaging
> Issue Type: Improvement
> Components: Format: TIFF
> Reporter: Gary Lucas
> Priority: Minor
>
> TIFF files support many different formats, some of them legacy or specialty
> formats, others that are widely used. DataReaderStrips and DataReaderTiled
> were originally written with a single block of code that collected the raw
> data (samples) for each pixel and then passed it into a single method that
> branched depending on the format. This approach meant that for each pixel,
> the reader loops had the extra overhead of a method call that executed
> multiple conditional evaluations. In 2012, enhancements were added to imaging
> to execute dedicated blocks of code for a few commonly used formats, most
> notably 3-byte RGB. However, at this time, the code does not support the
> case where the RGB is stored with a differencing predictor. Predictors
> improve the compression ratios (often significantly) when compressing RGB
> images. So I propose to enhance the dedicated RGB code to support predictors.
> Here's an example of some performance testing on a large image that uses
> compression with imaging. The time to load images was extracted using the
> Processing file: CONUS_LandWaterMask_LZW_RGB.tif (original)
> image size: 6000 by 4000
>
> {noformat}
> Processing file: CONUS_LandWaterMask_LZW_RGB.tif (original)
> image size: 6000 by 4000
> time to load image -- memory
> time ms avg ms -- used mb total mb
> 971.817 0.000 -- 213.592 252.000
> 921.690 0.000 -- 143.229 260.000
> 895.587 895.587 -- 96.234 174.000
> 899.227 897.407 -- 117.259 154.000
> 899.078 897.964 -- 134.200 184.000
> 889.602 895.873 -- 143.226 180.000
> 896.170 895.933 -- 128.183 188.000
> 894.250 895.652 -- 97.187 178.000
> 896.436 895.764 -- 103.226 186.000
> 891.540 895.236 -- 119.185 171.000
> Processing file: CONUS_LandWaterMask_LZW_RGB.tif (with changes)
> image size: 6000 by 4000
> time to load image -- memory
> time ms avg ms -- used mb total mb
> 498.123 0.000 -- 212.589 252.000
> 423.136 0.000 -- 110.733 237.000
> 396.021 396.021 -- 100.735 164.000
> 400.435 398.228 -- 115.725 160.000
> 400.901 399.119 -- 114.726 162.000
> 395.092 398.112 -- 118.711 159.000
> 394.106 397.311 -- 118.710 159.000
> 400.866 397.903 -- 118.710 159.000
> 400.972 398.342 -- 115.710 160.000
> 397.218 398.201 -- 109.691 164.000
> {noformat}
>
>
> Additionally, the special-purpose RGB block of code included additional logic
> to support a case for non-RGB formats where image samples were organized 3
> one byte samples, but the photometric interpretation was not RGB. According
> to Coveralls, this block of code is not exercised by any of our test images.
> Thus that part of the code is uncovered by testing. So I will be removing it
> to improve the code-coverage scores. I believe that this change is
> appropriate because, even if there are TIFF files "in the wild" that use this
> configuration, the commons imaging library will still work properly. In such
> a case, the image samples would be handled properly by the original,
> non-specialized block of code. Furthermore, I went through the TIFF
> specification and did not see any obvious examples of a case where that
> configuration would be likely.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)