> If an application can differ between holes and areas with data it can
> use this information for it's advantage. However if an application
> depends on holes and their precise layout in a file then destroying
> this layout will lead to a dataloss situation.

The possibility of a "dataloss situation" is what truly scares me
because this is now a data semantics issue.

The lseek, read, write system calls have served us well for decades
because they implemented a simple abstraction that a Unix/Linux file
can be seen from "any application" (ie, not just the one creating the file)
as a sequence of bytes. The idea of a hole as an "absence of data" that
should be supported by these system calls is a direct violation of that
principle. 

SEEK_HOLE|DATA introduce artificial syntactic constructs that cannot
be supported portably as soon as a file is copied (unless the copying
is done on the same disk type using the same block size). What will
happen to all the backup systems (to tapes, other storage devices, etc.)?
What about systems that mirror data at geographically separated locations
(with possibly different hardware set-ups) for increased reliability?
Applications that depend on SEEK_HOLE|DATA will be inherently fragile
and have inconsistent behaviors when run on different environments.

It seems to me that the desire of a quick solution for performance has
led to an inferior solution in the form of the SEEK_HOLE|DATA constructs.
The file system already supported an optimization in the form of treating
sequences of zeros as holes. This optimization can be generalized as
some compression file system would do and it is completely
transparent to ALL applications. Other than this, the file system
has no business in introducing unnecessary semantics into a typeless
stream of bytes.

There are only two ways to think about "absence of data". On the one
hand, if it has meaningful use in terms of operations to be performed
on a data stream, it should be explicitly encoded with some signaling
method by applications that require that semantics. On the other hand,
if "absence of data" has no meaningful operational semantics and is
used just to improve the performance of disk storage and access,
then leave it to the file system implementation to do what is right.

Phong

_______________________________________________
ast-users mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-users

Reply via email to