This issue was related to using Jemalloc. Jemalloc is not as well
tested with Bluestore and lead to lots of segfaults. We moved back to
the default of tcmalloc with Bluestore and these stopped.

Check /etc/sysconfig/ceph under RHEL based distros.

--
Adam
On Mon, Aug 27, 2018 at 9:51 PM Tyler Bishop
<tyler.bis...@beyondhosting.net> wrote:
>
> Did you solve this?  Similar issue.
> _____________________________________________
>
>
> On Wed, Feb 28, 2018 at 3:46 PM Kyle Hutson <kylehut...@ksu.edu> wrote:
>>
>> I'm following up from awhile ago. I don't think this is the same bug. The 
>> bug referenced shows "abort: Corruption: block checksum mismatch", and I'm 
>> not seeing that on mine.
>>
>> Now I've had 8 OSDs down on this one server for a couple of weeks, and I 
>> just tried to start it back up. Here's a link to the log of that OSD (which 
>> segfaulted right after starting up): 
>> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log
>>
>> To me, it looks like the logs are providing surprisingly few hints as to 
>> where the problem lies. Is there a way I can turn up logging to see if I can 
>> get any more info as to why this is happening?
>>
>> On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor <m...@oeg.com.au> wrote:
>>>
>>> On 7/02/2018 8:23 AM, Kyle Hutson wrote:
>>> > We had a 26-node production ceph cluster which we upgraded to Luminous
>>> > a little over a month ago. I added a 27th-node with Bluestore and
>>> > didn't have any issues, so I began converting the others, one at a
>>> > time. The first two went off pretty smoothly, but the 3rd is doing
>>> > something strange.
>>> >
>>> > Initially, all the OSDs came up fine, but then some started to
>>> > segfault. Out of curiosity more than anything else, I did reboot the
>>> > server to see if it would get better or worse, and it pretty much
>>> > stayed the same - 12 of the 18 OSDs did not properly come up. Of
>>> > those, 3 again segfaulted
>>> >
>>> > I picked one that didn't properly come up and copied the log to where
>>> > anybody can view it:
>>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
>>> > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.426.log>
>>> >
>>> > You can contrast that with one that is up:
>>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
>>> > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.428.log>
>>> >
>>> > (which is still showing segfaults in the logs, but seems to be
>>> > recovering from them OK?)
>>> >
>>> > Any ideas?
>>> Ideas ? yes
>>>
>>> There is a a bug which is hitting a small number of systems and at this
>>> time there is no solution. Issues details at
>>> http://tracker.ceph.com/issues/22102.
>>>
>>> Please submit more details of your problem on the ticket.
>>>
>>> Mike
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to