Hi Even, This looks great and I’m really looking for point 4 and 5. > Le 3 mai 2019 à 04:04, Even Rouault <[email protected]> a écrit : > > Hi, > > I wanted to mention COG related enhancements (*) that I will work on in GDAL > in the coming weeks, so interested parties are aware of them and can > potentially react. > > 1) Creation of a dedicated COG creation-only driver simplifying the creation > workflow. Currently, creating a COG involves a number of steps, using > gdaladdo > and gdal_translate with the right arguments. For very large COG files, > invoking gdaladdo in an efficient way can be tricky (.ovr.ovr trick: https:// > github.com/OSGeo/gdal/issues/1442). The driver will take care of creating > needed temporary overviews. > > 2) The driver will offer integrated reprojection capabilities, and in > particular a WebMercator/GoogleMapsCompatible tiling scheme profile (as > defined in WMTS), so that TIFF tiles exactly match GoogleMapsCompatible ones. > This will be similar to the corresponding option of GeoPackage. With a > subtelty that due to how GeoTIFF overviews work, it is not possible to have > this alignment on the tiling scheme for all zoom levels. So the user will > define how many zoom levels starting from the full resolution image must be > aligned (if N is the number of aligned levels, up to 2^N padding tiles in > horizontal and vertical dimensions are needed for the full resolution image, > so N should be kept reasonably small) > > 3) gdalwarp will be enhanced to allow output to drivers that have only > CreateCopy() capabilities such as the COG driver. It will try to avoid > materializing the intermediate file when possible by using VRT capabilities, > otherwise it will have to create a temporary TIFF file before creating > CreateCopy()
About point 1, 2, 3 I have mixed feeling because to me it seems that we will introduce a new driver to replace the combination of gdal commands (disclaimer as one of the creator of rio-cogeo I may not be fully objective here). > 4) Optimizations specific to JPEG-compressed imagery (YCbCr color space) with > a 1-bit transparency channel, to minimize the number of HTTP range requests > needed to read them. > As JPEG compression cannot include the transparency information, two TIFF IFD > have to be created: one for YCbCr, and another one for alpha. Currently the > COPY_SRC_OVERVIEWS=YES creation option of the GeoTIFF driver separates data > for all the tiles of the color channels from data for all the tiles of the > transparency channel. In practice, readers will generally want to access, for > a same location, to data of both color and transparency channels. I will > modify the writer to interleave blocks so that color and transparency > information are contiguous. If COLOR_X_Y designates the tile with color > information at coordinates X,Y (in tile coordinate space), the layout of data > in the file will be: COLOR_0_0, TRANSPARENCY_0_0, COLOR_1_0, > TRANSPARENCY_1_0, > etc. The GeoTIFF driver will be improved to fetch together the color and > transparency channel when such a layout is detected. Why this is specific to JPEG compression, what about other compressed format with internal mask ? > A further improvement is to be able to avoid completely to read the > TileByteCount array of the color channel, and the TileByteCount & TileOffset > arrays of the transparency channel. The trick is to reserve 4 bytes before > the > start of each COLOR_X_Y tile to indicate its size (those bytes will be > 'ghost', that is not in the range of data pointed by TileByCount&TileOffset). > An optimized reader wanting to read tile i=Y*nb_tiles_in_width+X will start > by > reading the offsets of tile i and i+1: TileOffset_color[i] and > TileOffset_color[i+1]. It will then seek to TileOffset_color[i] – 4 and read > 4 > + TileOffset_color[i+1] – TileOffset_color[i] bytes in a buffer. The first 4 > bytes of this buffer will indicate the number of bytes of the color tile, and > thus it is possible to deduce the offset and size of the mask tile that is > located at the end of the buffer. A TIFF metadata item will be written to > indicate that such layout has been used (with an indication of the file size > so as to be able to detect if the file has been later be altered in a non- > optimized way), so that optimized readers can adopt the above described > behavior. This will require to extend the libtiff interface so that the user > can directly provide the input buffer to decompress. > As the file will remain fully TIFF/BIGTIFF compliant, non-optimized readers > (such as newer GDAL builds against an older external libtiff version, or > previous GDAL versions) will still be able read it, loading values from the 4 > arrays instead of just one. > Note: for other compressions types, a simpler version of the above > optimization can still be done, by using TileOffset[i] and TileOffset[i+1], > and saving the read of TileByteCount[i] > To sum up, with the improvements of this task, once the initial loading of > metadata has been done, a GDAL ReadBlock(x,y) request will cause only two > networks range requests: one to read TileOffset[i] and TileOffset[i+1] > (potentially already cached if neighboring tiles have been previously > accessed > in the same process), and another one to read the imagery (+mask) data. > Whereas currently, 6 might be needed for JPEG YcbCr+mask. > > 5) Optimizing the layout of the header of a COG file > > The current layout of the header part of COG file is: > - TIFF / BigTIFF signature, followed by the offset of the first IFD (Image > File Directory) > - IFD of full resolution image, that is the list of the tags and their value > when it consists of a single numeric value, followed by the offset of the > next > - IFD. Its size is 2 + number_of_tags * 12 + 4 (or 2 + number_of_tags * 20 + > 8) bytes, so typically 200 bytes maximum > - Values of TIFF tags that don't fit inline in the IFD directory, such as > TileOffsets and TileByteCounts arrays and GeoTIFF keys > - IFD of first overview (typically subsampled by a factor of 2) > - Values of its tags that don't fit inline > - ... > -IFD of last overview > - Values of its tags that don't fit inline > > When the COG file is not too large, the fact of having the TileOffsets and > TileByteCounts between IFD descriptors is not an issue since they are not too > large, and most TIFF readers will load their values when opening the IFD. But > for an optimized reader such as GDAL with internal libtiff support (or with > external libtiff after the optimization of task 4), loading the values of the > TileOffsets/TileByteCounts arrays is only needed when accessing imagery. > > A more efficient layout for network access is : > - TIFF / BigTIFF signature, followed by the offset of the first IFD > - IFD of full resolution image, followed by the value of its non-inline tags, > except TileOffsets/TileByteCounts > - IFD of first overview followed by the value of its non-inline tags, except > - TileOffsets/TileByteCounts > - IFD of last overview followed by the value of its non-inline tags, except > TileOffsets/TileByteCounts > - Values of the TileOffsets/TileByteCounts arrays of IFD of full resolution > image > - Values of the TileOffsets/TileByteCounts arrays of IFD of first overview > - ... > - Values of the TileOffsets/TileByteCounts arrays of IFD of last overview > > With such a structure, the initial reading of 16 KB at the start of the file > will be able to load the IFD descriptors of all overviews (and masks, which > are actually interleaved in between when present). So, combined together with > task 4, a cold read of a tile at any zoom level (ie opening the file + tile > request) could result in just 3 network range requests: one to get the IFD > descriptors at the start of the file, one to read the location of the tile > from the TileOffsets array and one to read the tile data. > The proposed structure itself is still fully TIFF compliant. The script that > validates the COG structure will be adapted to accept that new variant of the > header structure. > > Even > > (*) Funding by Land Information New Zealand / https://www.linz.govt.nz/ > > -- > Spatialys - Geospatial professional services > http://www.spatialys.com > _______________________________________________ > gdal-dev mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/gdal-dev
_______________________________________________ gdal-dev mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/gdal-dev
