Re: [ceph-users] Package availability for Debian / Ubuntu

2019-05-16 Thread Christian Balzer
Hello, It's now May and nothing has changed here or in the tracker for the related Bionic issue. At this point in time it feels like Redhat/DIY or bust, neither are very enticing prospects. Definitely not going to deploy a Stretch and Luminous cluster next in July. Christian On Thu, 20 Dec

Re: [ceph-users] MDS Crashing 14.2.1

2019-05-16 Thread Adam Tygart
I ended up backing up the journals of the MDS ranks, recover_dentries for both of them, resetting the journals and session table. It is back up. The recover dentries stage didn't show any errors, so I'm not even sure why the MDS was asserting about duplicate inodes. -- Adam On Thu, May 16,

Re: [ceph-users] Samba vfs_ceph or kernel client

2019-05-16 Thread Maged Mokhtar
Thanks a lot for the clarification.  /Maged On 16/05/2019 17:23, David Disseldorp wrote: Hi Maged, On Fri, 10 May 2019 18:32:15 +0200, Maged Mokhtar wrote: What is the recommended way for Samba gateway integration: using vfs_ceph or mounting CephFS via kernel client ? i tested the kernel

Re: [ceph-users] Lost OSD from PCIe error, recovered, HOW to restore OSD process

2019-05-16 Thread Mark Lehrer
> Steps 3-6 are to get the drive lvm volume back How much longer will we have to deal with LVM? If we can migrate non-LVM drives from earlier versions, how about we give ceph-volume the ability to create non-LVM OSDs directly? On Thu, May 16, 2019 at 1:20 PM Tarek Zegar wrote: > FYI for

Re: [ceph-users] Lost OSD from PCIe error, recovered, HOW to restore OSD process

2019-05-16 Thread Tarek Zegar
FYI for anyone interested, below is how to recover from a someone removing a NVME drive (the first two steps show how mine were removed and brought back) Steps 3-6 are to get the drive lvm volume back AND get the OSD daemon running for the drive 1. echo 1 >

Re: [ceph-users] MDS Crashing 14.2.1

2019-05-16 Thread Adam Tygart
Hello all, The rank 0 mds is still asserting. Is this duplicate inode situation one that I should be considering using the cephfs-journal-tool to export, recover dentries and reset? Thanks, Adam On Thu, May 16, 2019 at 12:51 AM Adam Tygart wrote: > > Hello all, > > I've got a 30 node cluster

[ceph-users] Is it possible to hide slow ops resulting from bugs?

2019-05-16 Thread Jean-Philippe Méthot
Hi, Lately we’ve had to deal with https://tracker.ceph.com/issues/24531 which constantly trigger slow ops warning messages in our ceph health. As per the bug report, these appear to be only cosmetic and in no way affect the workings of the cluster. Is

Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-16 Thread Uwe Sauter
You could also edit your ceph-mon@.service (assuming systemd) to depend on chrony and add a line "ExecStartPre=/usr/bin/sleep 30" to stall the startup to give chrony a chance to sync before the Mon is started. Am 16.05.19 um 17:38 schrieb Stefan Kooman: Quoting Jan Kasprzak

Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-16 Thread Stefan Kooman
Quoting Jan Kasprzak (k...@fi.muni.cz): > OK, many responses (thanks for them!) suggest chrony, so I tried it: > With all three mons running chrony and being in sync with my NTP server > with offsets under 0.0001 second, I rebooted one of the mons: > > There still was the HEALTH_WARN

Re: [ceph-users] Samba vfs_ceph or kernel client

2019-05-16 Thread David Disseldorp
Hi Maged, On Fri, 10 May 2019 18:32:15 +0200, Maged Mokhtar wrote: > What is the recommended way for Samba gateway integration: using > vfs_ceph or mounting CephFS via kernel client ? i tested the kernel > solution in a ctdb setup and gave good performance, does it have any > limitations

Re: [ceph-users] Huge rebalance after rebooting OSD host (Mimic)

2019-05-16 Thread kas
huang jun wrote: : do you have osd's crush location changed after reboot? I am not sure which reboot do you mean, but to sum up what I wrote in previous messages i this thread, it probably went as follows: - reboot of the OSD server - the server goes up with wrong hostname "localhost" -

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Yan, Zheng
On Thu, May 16, 2019 at 4:10 PM Frank Schilder wrote: > > Dear Yan and Stefan, > > thanks for the additional information, it should help reproducing the issue. > > The pdsh command executes a bash script that echoes a few values to stdout. > Access should be read-only, however, we still have the

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Frank Schilder
Dear Yan, it is difficult to push the MDS to err in this special way. Is it advisable or not to increase the likelihood and frequency of dirfrag operations by tweaking some of the parameters mentioned here: http://docs.ceph.com/docs/mimic/cephfs/dirfrags/. If so, what would reasonable values

Re: [ceph-users] Poor performance for 512b aligned "partial" writes from Windows guests in OpenStack + potential fix

2019-05-16 Thread Marc Roos
Hmmm, looks like diskpart is of, reports the same about a volume, that fsutil fsinfo ntfsinfo c: report 512 (in this case correct, because it is on a ssd) Anyone knows how to use fsutil with a path mounted disk (without drive letter)? -Original Message- From: Marc Roos Sent:

Re: [ceph-users] Poor performance for 512b aligned "partial" writes from Windows guests in OpenStack + potential fix

2019-05-16 Thread Marc Roos
I am not sure if it is possible to run fsutil on disk without drive letter, but mounted on path. So I used: diskpart select volume 3 Filesystems And gives me this: Current File System Type : NTFS Allocation Unit Size : 4096 Flags : File Systems Supported for

[ceph-users] Repairing PG inconsistencies — Ceph Documentation - where's the text?

2019-05-16 Thread Stuart Longland
Hi all, I've got a placement group on a cluster that just refuses to clear itself up. Long story short, one of my storage nodes (combined OSD+MON with a single OSD disk) in my 3-node storage cluster keeled over, and in the short term, I'm running its OSD in a USB HDD dock on one of the remaining

Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-16 Thread Jan Kasprzak
Konstantin Shalygin wrote: : >how do you deal with the "clock skew detected" HEALTH_WARN message? : > : >I think the internal RTC in most x86 servers does have 1 second resolution : >only, but Ceph skew limit is much smaller than that. So every time I reboot : >one of my mons (for kernel upgrade

Re: [ceph-users] ceph -s finds 4 pools but ceph osd lspools says no pool which is the expected answer

2019-05-16 Thread Rainer Krienke
Hello Greg, thank you very much for your hint. If I should see this problem again I will try to restart the ceph-mgr daemon and see if this helps. Rainer > > I don't really see how this particular error can happen and be > long-lived, but if you restart the ceph-mgr it will probably resolve >

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Frank Schilder
Dear Yan and Stefan, thanks for the additional information, it should help reproducing the issue. The pdsh command executes a bash script that echoes a few values to stdout. Access should be read-only, however, we still have the FS mounted with atime enabled, so there is probably meta data

Re: [ceph-users] Poor performance for 512b aligned "partial" writes from Windows guests in OpenStack + potential fix

2019-05-16 Thread Trent Lloyd
For libvirt VMs, first you need to add "" to the relevant sections, and then stop/start the VM to apply the change. Then you need to make sure your VirtIO drivers (the Fedora/Red Hat variety anyway) are from late 2018 or so. There was a bug fixed around July 2018, before that date, the

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk): > Dear Stefan, > > thanks for the fast reply. We encountered the problem again, this time in a > much simpler situation; please see below. However, let me start with your > questions first: > > What bug? -- In a single-active MDS set-up, should there ever

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Frank Schilder
Dear Yan, OK, I will try to trigger the problem again and dump the information requested. Since it is not easy to get into this situation and I usually need to resolve it fast (its not a test system), is there anything else worth capturing? I will get back as soon as it happened again. In the

Re: [ceph-users] Poor performance for 512b aligned "partial" writes from Windows guests in OpenStack + potential fix

2019-05-16 Thread Alexandre DERUMIER
Many thanks for the analysis ! I'm going to test with 4K on heavy mssql database to see if I'm seeing improvement on ios/latency. I'll report results in this thread. - Mail original - De: "Trent Lloyd" À: "ceph-users" Envoyé: Vendredi 10 Mai 2019 09:59:39 Objet: [ceph-users] Poor

Re: [ceph-users] RBD Pool size doubled after upgrade to Nautilus and PG Merge

2019-05-16 Thread Wido den Hollander
On 5/12/19 4:21 PM, Thore Krüss wrote: > Good evening, > after upgrading our cluster yesterday to Nautilus (14.2.1) and pg-merging an > imbalanced pool we noticed that the number of objects in the pool has dubled > (rising synchronously with the merge progress). > > What happened there? Was

Re: [ceph-users] Grow bluestore PV/LV

2019-05-16 Thread Michael Andersen
Thanks! I'm on mimic for now, but I'll give it a shot on a test nautilus cluster. On Wed, May 15, 2019 at 10:58 PM Yury Shevchuk wrote: > Hello Michael, > > growing (expanding) bluestore OSD is possible since Nautilus (14.2.0) > using bluefs-bdev-expand tool as discussed in this thread: > >