Some discussion on IRC concerning editing manifest records in-place,
rather than via move-into-place of a tempfile, boiled down to "manifest
records should not cross OS page boundaries", and therefore "manifest
record length (i.e., the number of bytes for one revision's manifest
entry) should be a power of 2".

This came up in context of a proposal involving in-place editing of
manifests which hopefully one of us will bring to this list on
a separate email if it's fruitable.


[[[
00:54:59      @danielsh | when you say "I/O reordering", do you mean
00:55:11      @danielsh | "write()s by any one process are (not) done in 
chronological order"?
00:55:23       @stefan2 | exactly
00:55:32      @danielsh | ok
00:55:36      @danielsh | so, yes, that's one point I had in mind
00:55:41      @danielsh | today we only rely on move-into-place
00:55:52      @danielsh | any [pg]-esque suggestion means we require something 
new
00:56:05       @stefan2 | but they should be come visible in chronological 
order between processes - at least within the same
                        | address page
00:56:06      @danielsh | in this case, correct handling of overwriting of 
bytes in a file
00:56:34      @danielsh | re what you just said
00:56:40      @danielsh | why?  also between threads of the same process
00:56:53      @danielsh | [ and I don't follow how/why address spaces factor in 
]
00:57:24       @stefan2 | file caches use memory?
00:57:56      @danielsh | ah.
00:58:01       @stefan2 | the OS might reorder stuff for different pages but 
not for the same
00:58:15       @stefan2 | (no idea how relevant / likely that actually is)
00:58:18      @danielsh | so you said should == 'will probably be'   rather 
than  'should == need to be'
00:58:38       @stefan2 | yes
00:59:02      @danielsh | so.  if the manifest record crosses a page boundary 
we might have a VERY edge case bug?
00:59:42      @danielsh | have to admit I wouldn't have considered that... I 
would have stopped at the file level not at the
                        | page-that-file-is-cached-in level
01:01:16       @stefan2 | I have no idea how an OS makes shared file content 
visible to the respective processes. the edge
                        | case might happen only if a "record" crosses the page 
boundary
01:02:16      @danielsh | ack
01:02:49      @danielsh | you're saying the OS might not be ACID'ing the view 
of the file.  (since it presents a view that
                        | never was present on disk)
01:02:59      @danielsh | who knows, that may be a risk we'll take
01:03:25       @stefan2 | there is copy-on-write semantics for (some forms of) 
memory mapped files under windows. Depending on
                        | when the respective page is mapped it may see the old 
or new content (the old page got copied while
                        | the new one didn't need to back then)
01:03:45        @peterS | danielsh: if the manifest record crosses a page 
boundary, the reason this might matter is in case of
                        | an OS crash.  one block is written, the next not
01:04:29       @stefan2 | danielsh: yes. but I have no actual facts / pointers 
to support that suspicion.
01:04:34        @peterS | but if we're using 16-byte records and aligning to 16 
byte boundaries, it would take a very strange
                        | disk block size to make this possible
01:05:34        @peterS | if we think it's unreasonable to assume 1000 revprop 
blobs will always average less than 256 GB per
                        | blob, though, then we'll need more than 16 bytes.  or 
a binary representation as I've argued for
                        | before
01:05:34      @danielsh | stefan2, peterS: so between the two of you you're 
arguing that records should not cross page
                        | boundaries
01:05:56      @danielsh | fine, and I'll raise that point on dev@ for posterity
01:06:18        @peterS | I think that's reasonable.  and we don't even know 
the disk or filesystem block size, i.e., we can't
                        | _really_ assume it's greater than, say, 512 bytes
01:06:31        @peterS | so we really want to align on a power-of-two
01:06:33       @stefan2 | peterS: any 2^N size should work
01:06:42        @peterS | ...stefan2 agreed (:
01:06:48       @stefan2 | indeed ;)
01:07:09      @danielsh | +1
]]]

Reply via email to