On Tue, Jul 18, 2017 at 6:08 AM, Marcus Furlong <furlo...@gmail.com> wrote:
> On 22 March 2017 at 05:51, Dan van der Ster <d...@vanderster.com> wrote:
>> On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong <furlo...@gmail.com>
>> wrote:
>>> Hi,
>>>
>>> I'm experiencing the same issue as outlined in this post:
>>>
>>>
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013330.html
>>>
>>> I have also deployed this jewel cluster using ceph-deploy.
>>>
>>> This is the message I see at boot (happens for all drives, on all OSD
>>> nodes):
>>>
>>> [ 92.938882] XFS (sdi1): Mounting V5 Filesystem
>>> [ 93.065393] XFS (sdi1): Ending clean mount
>>> [ 93.175299] attempt to access beyond end of device
>>> [ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767
>>>
>>> and again while the cluster is in operation:
>>>
>>> [429280.254400] attempt to access beyond end of device
>>> [429280.254412] sdi1: rw=0, want=19134412768, limit=19134412767
>>>
>>
>> We see these as well, and I'm also curious what's causing it. Perhaps
>> sgdisk is doing something wrong when creating the ceph-data partition?
>
> Apologies for reviving an old thread, but I figured out what happened and
> never documented it, so I thought an update might be useful.
>
> The disk layout I've ascertained is as follows:
>
> sector 0 = protective MBR (or empty)
> sectors 1 to 33 = GPT (33 sectors)
> sectors 34 to 2047 = free (as confirmed by sgdisk -f -E)
> sectors 2048 to 19134414814 (19134412767 sectors: Data Partition 1)
> sectors 19134414815 to 19134414847 (33 sectors: GPT backup data)
>
> And the error:
>
> [ 92.938882] XFS (sdi1): Mounting V5 Filesystem
> [ 93.065393] XFS (sdi1): Ending clean mount
> [ 93.175299] attempt to access beyond end of device
> [ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767
>
> This shows that the error occurs when trying to access sector 1913441278 of
> Partition 1, which we can see from the above, doesn't exist.
>
> I noticed that the file system size is 3.5KiB less than the size of the
> partition, and the XFS block size is 4KiB.
>
> EMDS = 19134412767 * 512 = 9796819336704 <- actual partition size
> CDS = 9567206383 * 1024 = 9796819336192 (512 bytes less than EMDS) <- oddly
> /proc/partitions reports 512 bytes less, because it's using 1024 bytes as
> the unit
> FSS = 2391801595 * 4096 = 9796819333120 (3072 bytes less than CDS) <-
> filesystem
>
> It turns out, if I create a partition that matches the block size of the XFS
> filesystem, then the error does not occur. i.e. no error when the filesystem
> starts _and_ ends on a partition boundary.
>
> When this happens, e.g. as follows, then there is no issue. This partition
> is 7 sectors smaller than the one referenced above.
>
> # sgdisk --new=0:2048:19134414807 -- /dev/sdi
> Creating new GPT entries.
> The operation has completed successfully.
>
> # sgdisk -p /dev/sdi
> Disk /dev/sdf: 19134414848 sectors, 8.9 TiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): 3E61A8BA-838A-4D7E-BB8E-293972EB45AE
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 19134414814
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 2021 sectors (1010.5 KiB)
>
> When the end of the partition is not aligned to the 4KiB blocks used by XFS,
> the error occurs. This explains why the defaults from parted work correctly,
> as the 1MiB "padding" is 4K-aligned.
>
> This non-alignment happens because ceph-deploy uses sgdisk, and sgdisk seems
> to align the start of the partition with 2048-sector boundaries, but _not_
> the end of the partition, when used with the -L parameter.
>
> The fix was to recreate the partition table, and reduce the unused sectors
> down to the max filesystem size:
>
> https://gist.github.com/furlongm/292aefa930f40dc03f21693d1fc19f35
>
> In my testing, I could only reproduce this with XFS, not with other
> filesystems. It can be reproduced on smaller XFS filesystems but seems to
> take more time.

Great work. I've tested (in print mode) and seems to detect things
correctly here:

/dev/sdz1
OSD ID : 88
Partition size in sectors : 11721043087
Sector size               : 512
Partition size in bytes   : 6001174060544
XFS block size            : 4096
# of XFS blocks           : 1465130385
XFS filsystem size        : 6001174056960
Unused sectors            : 7
Unused bytes (unused sector count * sector size) : 3584
Unused bytes (partition size - filesystem size)  : 3584
Filesystem is not correctly aligned to partition boundary :-(
systemctl stop ceph-osd@88
umount /dev/sdz1
sgdisk --delete=1 -- /dev/sdz
sgdisk --new=1:2048:11721045127 --change-name=1:"ceph data"
--partition-guid=1:c0832f78-5d7c-49f7-a133-786424b8b491
--typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdz
partprobe /dev/sdz
xfs_repair /dev/sdz1
sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/sdz


But one thing is still unclear to me. sgdisk is not aligning the end
of the partition -- fine. But xfs creates a filesystem that fits
within that partition - i.e. the filesystem size is smaller (by 7
sectors) than the partition. So, what exactly is trying to access
outside the partition?

sdz1: rw=0, want=11721043088, limit=11721043087

Are we sure that there is no filesystem data in those 7 sectors? The
attempted access (end-of-filesystem + 8 sectors) would be the first
sector of the GPT backup. Have you checked if the backup is
uncorrupted?

(And those xfs_aops oops which to thought to be unrelated -- did you
nevertheless see those disappear after you fixed your partition
alignments?)

Basically, I'm still wondering if this is all harmless, or if we
really do need to realign these partitions.

Cheers, Dan
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to