On 2026/06/04 22:01:32 Andrew Bell wrote: > How can a reader know that it has the tooling to read a file with this > approach?
At present there isn't an in-use mechanism beyond parsing the "created_by" string. > What is the hesitation to change version numbers? Which version number? The version number in the FileMetaData would sort of work, except in the case of an incompatible change made to the metadata. We could change the file magic from PAR1 to something else, but that is not workable beyond PAR9, say. Also, the file magic really shouldn't change frequently as that breaks tools like the unix "file" command. One thought I had, that should not break any current readers, would be to expand the header from 4 to 8 bytes say. We could embed a version number in bytes 4-7. Writing a decimal 2026 perhaps (if we use calendar year only), or 202606. Or use SemVer, one byte each for major/minor/patch. Or make the header longer and embed a fixed-length, space or null padded string. This expanded header shouldn't break current readers since the offset for the first page should be obtained from the ColumnMetaData. If there are readers that rely on a page starting immediately after the 'PAR1', we could mandate that the first byte following PAR1 is 0. A thrift parser would see that as the end of the PageHeader struct and then likely fail on missing required fields. Ed
