Good to get some high-quality feedback.
On Sat, 9 Aug 2003, Leonard Rosenthol wrote:
> At 6:01 PM -0700 8/8/03, Nathan Carl Summers wrote:
> >Let us start with an existing graphics format, for inspiration if nothing
> >The format I chose is PNG, because it is arguably the best existing
> >lossless portable graphics format available.
> Well, I would argue that TIFF has the "crown"...
> However, PNG is an excellent standard, regardless.
Good point. It can't hurt to take a look at several graphics formats and
take the best parts from each of them.
> >4 capable of representing trees and graphs
> Trees, yes - for things like layers. But why a graph??
GEGL supports graphs. If we use GEGL graphs, we'll need a representation
> >5 recoverable from corruption
> >6 fast random access of data
> >9 fast loads and saves
> >10 compact
> Good goals, but not a requirements. Perhaps you should
> separate those two things out...
I see fast loads as an absolute requirement. Being compact is nice as
well, because not everyone has 3 terrabyte harddrives and a T3 line into
Hopefully, GIMP's file handling will improve to the point where it will
load thing on an as-needed basis. Therefore, fast random access is
necessary. A VIPS-like demand-driven pipeline would increase gimp
responsiveness a lot.
> And I can think of other goals that I'd like to see:
> * incremental update
> just update a single layer w/o rewriting the whole file!
This seems like an excellent goal. It seems like you are suggesting a
> * rich metadata
> (this may be your 7, but needs to be spelled out)
Well, that was what I meant by extensibility and the ablity to represent
anything GIMP can. I agree that this is important.
> >PNG certainly supports 1,2,6,7,9,10, and 11. Let us examine the other
> >issues in more detail.
> I would argue that PNG doesn't do 7 - it has no native
> support for CMYK, for example. (but yes, it does RGB, Gray and
> And for comparison, I would offer that TIFF does the same
> list and REALLY does 7, including CMYK, Lab, ICC and Spot color
> spaces. It's extensibility is similar to PNG (in fact, PNG's chunks
> were modelled on TIFF chunks).
> >A pure XML format, by way of comparison, would fulfill requirements
> >1,2,3,4,7, and 8.
> I'd add 9, just being in XML doesn't mean it can't be fast.
I guess if you used raw image data instead of base64 or something similar
> > Requirement 5 in practice would be difficult to fulfill
> >in a pure XML format without hand-hacking, which is beyound the skills of
> >most users. A zlib-style compression step could make some progress
> >towards 10.
> But gzipping the entire XML block would then pretty make 6
> impossible unless you want to seriously increase in-memory
> >An archive with XML metadata and png graphical data, on the other hand,
> >would satisfy requirements 1,2,3,4,7,8, and 11.
> An archive (zip, tar, ar) with XML metadata plus raster image
> data (ie. my previous proposal) would meet 1,2,3,4,6,7,8,10,11. 5 &
> 10 are related to the archive format of choice since some are better
> at these than others. But yes, I suspect that it would probably be a
> bit slower.
> >Requirement 6 is
> >fulfilled for simple images, but for more complex images XML does not
> >scale well, since every bite from the begining of the XML file to the
> >place in which the data you are interested in is.
> But the XML is just a "catalog" of what's in the archive (at
> least in my proposal). So you read the catalog up front and then use
> it to quickly find the part of the archive you want and viola - fast
> random access to data.
> >It seems like all we have to do is combine the strengths of PNG and the
> >strengths of XML to create a format that satisfies our requirements. What
> >we really need is not an extensible text markup language, but an
> >extensible graphics markup format.
> That's what TIFF and PNG were designed for.
> >Portable XCF would use a chunk system similar to PNG, with two major
> >differences. First, chunk type would be a string instead of a 32-bit
> >value. Second, chunks can contain an arbitrary number of subchunks, which
> >of course can contain subchunks themselves.
> I think sub-chunks is a bad idea. Although a common way to
> represent hierarchical relationship, they can also put overhead on
> random access and also slow down read/write under certain conditions.
How about a TIFF-like directory chunk at the beginning (except
> >At the end of each chunk is a checksum, as well as a close-chunk marker.
> >The purpose of the close-chunk marker is to help recover in case of
> >corruption; if no corruption is detected, the close-chunk marker is
> This is a common technique in many file formats for
> corruption detection. It works.
> >One of the major advantages of this hybred technique is that if an
> >implementation does not understand or is not interested in a particular
> >chunk, it can seek to the next chunk without having to read or parse any
> >of the data in-between.
> How does it do that? How do you find "start of chunk"
> without a catalog? How do you get random access to a particular
> chunk w/o a catalog?
It traverses the file in a linked-list style. But you are right that a
directory block would be even faster.
> >image data chunks should use png-style adaptive predictive compression.
> >They should also use adam-7.
> Great - but that's not specific to a file format - we can do
> that anywhere...
Indeed we can.
Gimp-developer mailing list