On Wed, Sep 27, 2023 at 06:08:21PM -0400, Kent Overstreet wrote: > On Wed, Sep 27, 2023 at 07:23:37AM -0400, Brian Foster wrote: > > An fsstress task on a big endian system (s390x) quickly produces a > > bunch of CRC errors in the system logs. Most of these are related to > > the narrow CRCs path, but the fundamental problem can be reduced to > > a single write and re-read (after dropping caches) of a previously > > merged extent. > > > > The key merge path that handles extent merges eventually calls into > > bch2_checksum_merge() to combine the CRCs of the associated extents. > > This code attempts to avoid a byte order swap by feeding the le64 > > values into the crc32c code, but the latter casts the resulting u64 > > value down to a u32, which truncates the high bytes where the actual > > crc value ends up. This results in a CRC value that does not change > > (since it is merged with a CRC of 0), and checksum failures ensue. > > > > Fix the checksum merge code to swap to cpu byte order on the > > boundaries to the external crc code such that any value casting is > > handled properly. > > Thanks! Applied. > > We really need to test creating a filesystem and then reading from it on > an opposite endianness machine, have you gotten a chance to do that? >
I gave it a quick test by just dd'ing the disk image off my fstests TEST_DEV from the BE box I've been playing with and mounting it on a LE system. The fs mounts, but eventually complains about a backpointer issue after some stress I/O: bcachefs (loop0): error validating btree node at btree backpointers level 0/1 u64s 11 type btree_ptr_v2 0:5342578688:0 len 0 ver 0: seq 8574dcb72b17e918 written 486 min_key 0:3338403840:1 durability: 1 ptr: 0:10388:0 gen 6 node offset 486 bset u64s 1300: invalid bkey: backpointer at wrong pos u64s 9 type backpointer 0:3339255808:0 len 0 ver 0: bucket=0:6369:0 btree=extents l=0 offset=0:256 len=64 pos=536913736:256:U32_MAX, shutting down bcachefs (loop0): inconsistency detected - emergency read only bcachefs (loop0): __bch2_btree_write_buffer_flush: insert error EIO bcachefs (loop0 inum 201326618 offset 246272): write error while doing btree update: EIO ... and fsck similarly complains about a bunch more bp and lru related inconsistencies. Write buffer issue, perhaps? At a glance, that seq value looks kind of bogus, but I haven't had a chance to dig into the details yet. Everything seems in order with the same image file on the BE box, FWIW. Brian > Otherwise, there's the big endian support in ktest to start looking at > again. >
