> If an application can differ between holes and areas with data it can > use this information for it's advantage. However if an application > depends on holes and their precise layout in a file then destroying > this layout will lead to a dataloss situation.
The possibility of a "dataloss situation" is what truly scares me because this is now a data semantics issue. The lseek, read, write system calls have served us well for decades because they implemented a simple abstraction that a Unix/Linux file can be seen from "any application" (ie, not just the one creating the file) as a sequence of bytes. The idea of a hole as an "absence of data" that should be supported by these system calls is a direct violation of that principle. SEEK_HOLE|DATA introduce artificial syntactic constructs that cannot be supported portably as soon as a file is copied (unless the copying is done on the same disk type using the same block size). What will happen to all the backup systems (to tapes, other storage devices, etc.)? What about systems that mirror data at geographically separated locations (with possibly different hardware set-ups) for increased reliability? Applications that depend on SEEK_HOLE|DATA will be inherently fragile and have inconsistent behaviors when run on different environments. It seems to me that the desire of a quick solution for performance has led to an inferior solution in the form of the SEEK_HOLE|DATA constructs. The file system already supported an optimization in the form of treating sequences of zeros as holes. This optimization can be generalized as some compression file system would do and it is completely transparent to ALL applications. Other than this, the file system has no business in introducing unnecessary semantics into a typeless stream of bytes. There are only two ways to think about "absence of data". On the one hand, if it has meaningful use in terms of operations to be performed on a data stream, it should be explicitly encoded with some signaling method by applications that require that semantics. On the other hand, if "absence of data" has no meaningful operational semantics and is used just to improve the performance of disk storage and access, then leave it to the file system implementation to do what is right. Phong _______________________________________________ ast-users mailing list [email protected] https://mailman.research.att.com/mailman/listinfo/ast-users
