On Wed, Jan 17, 2018 at 11:45 AM, Martin Emrich <martin.emr...@empolis.com> wrote:
> Hi Orit! > > > > I did some tests, and indeed the combination of Versioning/Lifecycle with > Resharding is the problem: > > > > - If I do not enable Versioning/Lifecycle, Autoresharding works fine. > - If I disable Autoresharding but enable Versioning+Lifecycle, pushing > data works fine, until I manually reshard. This hangs also. > > > Thanks for testing :) This is very helpful! My lifecycle rule (which shall remove all versions older than 60 days): > > > > { > > "Rules": [{ > > "Status": "Enabled", > > "Prefix": "", > > "NoncurrentVersionExpiration": { > > "NoncurrentDays": 60 > > }, > > "Expiration": { > > "ExpiredObjectDeleteMarker": true > > }, > > "ID": "expire-60days" > > }] > > } > > > > I am currently testing with an application containing customer data, but I > am also creating some random test data to create logs I can share. > > I will also test whether the versioning itself is the culprit, or if it is > the lifecycle rule. > > > I am suspecting versioning (never tried it with resharding). Can you open a tracker issue with all the information? Thanks, Orit Regards, > > Martin > > > > *Von: *Orit Wasserman <owass...@redhat.com> > *Datum: *Dienstag, 16. Januar 2018 um 18:38 > *An: *Martin Emrich <martin.emr...@empolis.com> > *Cc: *ceph-users <ceph-users@lists.ceph.com> > *Betreff: *Re: [ceph-users] Bug in RadosGW resharding? Hangs again... > > > > Hi Martin, > > > > On Mon, Jan 15, 2018 at 6:04 PM, Martin Emrich <martin.emr...@empolis.com> > wrote: > > Hi! > > After having a completely broken radosgw setup due to damaged buckets, I > completely deleted all rgw pools, and started from scratch. > > But my problem is reproducible. After pushing ca. 100000 objects into a > bucket, the resharding process appears to start, and the bucket is now > unresponsive. > > > > Sorry to hear that. > > Can you share radosgw logs with --debug_rgw=20 --debug_ms=1? > > > > I just see lots of these messages in all rgw logs: > > 2018-01-15 16:57:45.108826 7fd1779b1700 0 block_while_resharding ERROR: > bucket is still resharding, please retry > 2018-01-15 16:57:45.119184 7fd1779b1700 0 NOTICE: resharding operation on > bucket index detected, blocking > 2018-01-15 16:57:45.260751 7fd1120e6700 0 block_while_resharding ERROR: > bucket is still resharding, please retry > 2018-01-15 16:57:45.280410 7fd1120e6700 0 NOTICE: resharding operation on > bucket index detected, blocking > 2018-01-15 16:57:45.300775 7fd15b979700 0 block_while_resharding ERROR: > bucket is still resharding, please retry > 2018-01-15 16:57:45.300971 7fd15b979700 0 WARNING: set_req_state_err > err_no=2300 resorting to 500 > 2018-01-15 16:57:45.301042 7fd15b979700 0 ERROR: > RESTFUL_IO(s)->complete_header() > returned err=Input/output error > > One radosgw process and two OSDs housing the bucket index/metadata are > still busy, but it seems to be stuck again. > > How long is this resharding process supposed to take? I cannot believe > that an application is supposed to block for more than half an hour... > > I feel inclined to open a bug report, but I am yet unshure where the > problem lies. > > Some information: > > * 3 RGW processes, 3 OSD hosts with 12 HDD OSDs and 6 SSD OSDs > * Ceph 12.2.2 > * Auto-Resharding on, Bucket Versioning & Lifecycle rule enabled. > > > > What life cycle rules do you use? > > > > Regards, > > Orit > > Thanks, > > Martin > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com