Op 28/09/2023 om 18:57 schreef Kent Overstreet:
On Thu, Sep 28, 2023 at 10:12:18AM -0400, Brian Foster wrote:
On Wed, Sep 27, 2023 at 06:08:21PM -0400, Kent Overstreet wrote:
On Wed, Sep 27, 2023 at 07:23:37AM -0400, Brian Foster wrote:
An fsstress task on a big endian system (s390x) quickly produces a
bunch of CRC errors in the system logs. Most of these are related to
the narrow CRCs path, but the fundamental problem can be reduced to
a single write and re-read (after dropping caches) of a previously
merged extent.

The key merge path that handles extent merges eventually calls into
bch2_checksum_merge() to combine the CRCs of the associated extents.
This code attempts to avoid a byte order swap by feeding the le64
values into the crc32c code, but the latter casts the resulting u64
value down to a u32, which truncates the high bytes where the actual
crc value ends up. This results in a CRC value that does not change
(since it is merged with a CRC of 0), and checksum failures ensue.

Fix the checksum merge code to swap to cpu byte order on the
boundaries to the external crc code such that any value casting is
handled properly.
Thanks! Applied.

We really need to test creating a filesystem and then reading from it on
an opposite endianness machine, have you gotten a chance to do that?

I gave it a quick test by just dd'ing the disk image off my fstests
TEST_DEV from the BE box I've been playing with and mounting it on a LE
system. The fs mounts, but eventually complains about a backpointer
issue after some stress I/O:

  bcachefs (loop0): error validating btree node at btree backpointers level 0/1
    u64s 11 type btree_ptr_v2 0:5342578688:0 len 0 ver 0: seq 8574dcb72b17e918 
written 486 min_key 0:3338403840:1 durability: 1 ptr: 0:10388:0 gen 6
    node offset 486 bset u64s 1300: invalid bkey: backpointer at wrong pos
    u64s 9 type backpointer 0:3339255808:0 len 0 ver 0: bucket=0:6369:0 
btree=extents l=0 offset=0:256 len=64 pos=536913736:256:U32_MAX, shutting down
  bcachefs (loop0): inconsistency detected - emergency read only
  bcachefs (loop0): __bch2_btree_write_buffer_flush: insert error EIO
  bcachefs (loop0 inum 201326618 offset 246272): write error while doing btree 
update: EIO

... and fsck similarly complains about a bunch more bp and lru related
inconsistencies. Write buffer issue, perhaps? At a glance, that seq
value looks kind of bogus, but I haven't had a chance to dig into the
details yet. Everything seems in order with the same image file on the
BE box, FWIW.
bch_backpointer looks highly suspect re: endianness, if it's not fixable
we'll have to do a bch_backpointer_v2. I expect it will be fixable
though, just tricky.

The LRU btree is just a bitset btree now, so that shouldn't have
endianness issues.

So yeah, we definitely need to get automated foreign endianness testing
going - if there's going to be more of this I don't want us to be doing
this by hand, and we need to make sure issues like this get caught in
the future.

jpsollie was looking at ktest support for big endian architectures
recently and had some patches for that, I just haven't had time to look
at them - jpsollie, do you think you can post those patches to the list?


Currently finalizing them,

The issue is - mostly:
all BE architectures for debian are unofficial, and when you want to use a 
unofficial port,
you'll need to use debian sid, which is a trial-and-error when talking about 
dependencies

I'll create a patch of the commits in the PR.

Janpieter Sollie

Reply via email to