On 2026/06/04 22:01:32 Andrew Bell wrote:
> How can a reader know that it has the tooling to read a file with this
> approach? 

At present there isn't an in-use mechanism beyond parsing the "created_by" 
string.

> What is the hesitation to change version numbers?

Which version number? The version number in the FileMetaData would sort of work,
except in the case of an incompatible change made to the metadata. We could 
change
the file magic from PAR1 to something else, but that is not workable beyond 
PAR9, say.
Also, the file magic really shouldn't change frequently as that breaks tools 
like the unix
"file" command.

One thought I had, that should not break any current readers, would be to 
expand the header
from 4 to 8 bytes say. We could embed a version number in bytes 4-7. Writing a 
decimal
2026 perhaps (if we use calendar year only), or 202606. Or use SemVer, one byte 
each for
major/minor/patch. Or make the header longer and embed a fixed-length, space or 
null
padded string. This expanded header shouldn't break current readers since the 
offset for
the first page should be obtained from the ColumnMetaData. If there are readers 
that rely
on a page starting immediately after the 'PAR1', we could mandate that the 
first byte
following PAR1 is 0. A thrift parser would see that as the end of the 
PageHeader struct
and then likely fail on missing required fields.

Ed

Reply via email to