On Wed, Sep 27, 2023 at 06:08:21PM -0400, Kent Overstreet wrote:
> On Wed, Sep 27, 2023 at 07:23:37AM -0400, Brian Foster wrote:
> > An fsstress task on a big endian system (s390x) quickly produces a
> > bunch of CRC errors in the system logs. Most of these are related to
> > the narrow CRCs path, but the fundamental problem can be reduced to
> > a single write and re-read (after dropping caches) of a previously
> > merged extent.
> > 
> > The key merge path that handles extent merges eventually calls into
> > bch2_checksum_merge() to combine the CRCs of the associated extents.
> > This code attempts to avoid a byte order swap by feeding the le64
> > values into the crc32c code, but the latter casts the resulting u64
> > value down to a u32, which truncates the high bytes where the actual
> > crc value ends up. This results in a CRC value that does not change
> > (since it is merged with a CRC of 0), and checksum failures ensue.
> > 
> > Fix the checksum merge code to swap to cpu byte order on the
> > boundaries to the external crc code such that any value casting is
> > handled properly.
> 
> Thanks! Applied.
> 
> We really need to test creating a filesystem and then reading from it on
> an opposite endianness machine, have you gotten a chance to do that?
> 

I gave it a quick test by just dd'ing the disk image off my fstests
TEST_DEV from the BE box I've been playing with and mounting it on a LE
system. The fs mounts, but eventually complains about a backpointer
issue after some stress I/O:

 bcachefs (loop0): error validating btree node at btree backpointers level 0/1
   u64s 11 type btree_ptr_v2 0:5342578688:0 len 0 ver 0: seq 8574dcb72b17e918 
written 486 min_key 0:3338403840:1 durability: 1 ptr: 0:10388:0 gen 6
   node offset 486 bset u64s 1300: invalid bkey: backpointer at wrong pos
   u64s 9 type backpointer 0:3339255808:0 len 0 ver 0: bucket=0:6369:0 
btree=extents l=0 offset=0:256 len=64 pos=536913736:256:U32_MAX, shutting down
 bcachefs (loop0): inconsistency detected - emergency read only
 bcachefs (loop0): __bch2_btree_write_buffer_flush: insert error EIO
 bcachefs (loop0 inum 201326618 offset 246272): write error while doing btree 
update: EIO

... and fsck similarly complains about a bunch more bp and lru related
inconsistencies. Write buffer issue, perhaps? At a glance, that seq
value looks kind of bogus, but I haven't had a chance to dig into the
details yet. Everything seems in order with the same image file on the
BE box, FWIW.

Brian

> Otherwise, there's the big endian support in ktest to start looking at
> again.
> 

Reply via email to