Thanks for this info -- adding it to our list of reasons never to use
FileStore again.
In your case, are you able to migrate?


On Tue, Jul 14, 2020 at 3:13 PM Eric Smith <[email protected]> wrote:
>
> FWIW Bluestore is not affected by this problem!
>
> -----Original Message-----
> From: Eric Smith <[email protected]>
> Sent: Saturday, July 11, 2020 6:40 AM
> To: [email protected]
> Subject: [ceph-users] Re: Luminous 12.2.12 - filestore OSDs take an hour to 
> boot
>
> It does appear that long file names and filestore seem to be a real problem. 
> We have a cluster where 99% of the objects have names longer than N (220+?) 
> characters such that it truncates the file name (as seen below with 
> "_<sha-sum>_0_long") and stores the full object name in xattrs for the 
> object. During boot the OSD goes out to lunch for increasing amounts of time 
> based on the number of objects on disk you have that meet this criteria (With 
> 2.4 million ish objects that meet this criteria, the OSD takes over an hour 
> to boot). I plan on testing this same scenario with BlueStore to see if it's 
> also susceptible to these boot / read issues.
>
> Eric
>
> -----Original Message-----
> From: Eric Smith <[email protected]>
> Sent: Friday, July 10, 2020 1:46 PM
> To: [email protected]
> Subject: [ceph-users] Re: Luminous 12.2.12 - filestore OSDs take an hour to 
> boot
>
> For what it's worth - all of our objects are generating LONG named object 
> files like so...
>
> \uABCD\ucontent.\srecording\swzdchd\u\utnda-trg-1008007-wzdchd-216203706303281120-230932949-1593482400-159348660000000001\swzdchd\u\utpc2-tp1-1008007-wzdchd-216203706303281120-230932949-1593482400-159348660000000001\u\uwzdchd3._0bfd7c716b839cb7b3ad_0_long
>
> Does this matter? AFAICT it sees this as a long file name and has to lookup 
> the object name in the xattrs ? Is that bad?
>
> -----Original Message-----
> From: Eric Smith <[email protected]>
> Sent: Friday, July 10, 2020 6:59 AM
> To: [email protected]
> Subject: [ceph-users] Luminous 12.2.12 - filestore OSDs take an hour to boot
>
> I have a cluster running Luminous 12.2.12 with Filestore and it takes my OSDs 
> somewhere around an hour to start (They do start successfully - eventually). 
> I have the following log entries that seem to show the OSD process attempting 
> to descend into the PG directory on disk and create an object list of some 
> sort:
>
> 2020-07-09 18:29:28.017207 7f3b680afd80 20 osd.1 137390  clearing temps in 
> 8.14ads3_head pgid 8.14ads3
> 2020-07-09 18:29:28.017211 7f3b680afd80 20 
> filestore(/var/lib/ceph/osd/ceph-1) collection_list(5012): pool is 8 shard is 
> 3 pgid 8.14ads3
> 2020-07-09 18:29:28.017213 7f3b680afd80 10 
> filestore(/var/lib/ceph/osd/ceph-1) collection_list(5020): first checking 
> temp pool
> 2020-07-09 18:29:28.017215 7f3b680afd80 20 
> filestore(/var/lib/ceph/osd/ceph-1) collection_list(5012): pool is -10 shard 
> is 3 pgid 8.14ads3
> 2020-07-09 18:29:28.017221 7f3b680afd80 20 _collection_list_partial 
> start:GHMIN end:GHMAX-64 ls.size 0
> 2020-07-09 18:29:28.017263 7f3b680afd80 20 
> filestore(/var/lib/ceph/osd/ceph-1) objects: []
> 2020-07-09 18:29:28.017268 7f3b680afd80 10 
> filestore(/var/lib/ceph/osd/ceph-1) collection_list(5028): fall through to 
> non-temp collection, start 3#-1:00000000::::0#
> 2020-07-09 18:29:28.017272 7f3b680afd80 20 _collection_list_partial 
> start:3#-1:00000000::::0# end:GHMAX-64 ls.size 0
> 2020-07-09 18:29:28.038124 7f3b680afd80 20 list_by_hash_bitwise prefix D
> 2020-07-09 18:29:28.058679 7f3b680afd80 20 list_by_hash_bitwise prefix DA
> 2020-07-09 18:29:28.069432 7f3b680afd80 20 list_by_hash_bitwise prefix DA4
> 2020-07-09 18:29:29.789598 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(4010): woke after 5.000074
> 2020-07-09 18:29:29.789634 7f3b51a87700 10 journal commit_start 
> max_applied_seq 53085082, open_ops 0
> 2020-07-09 18:29:29.789639 7f3b51a87700 10 journal commit_start blocked, all 
> open_ops have completed
> 2020-07-09 18:29:29.789641 7f3b51a87700 10 journal commit_start nothing to do
> 2020-07-09 18:29:29.789663 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(3994):  waiting for 
> max_interval 5.000000
> 2020-07-09 18:29:34.789815 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(4010): woke after 5.000109
> 2020-07-09 18:29:34.789898 7f3b51a87700 10 journal commit_start 
> max_applied_seq 53085082, open_ops 0
> 2020-07-09 18:29:34.789902 7f3b51a87700 10 journal commit_start blocked, all 
> open_ops have completed
> 2020-07-09 18:29:34.789906 7f3b51a87700 10 journal commit_start nothing to do
> 2020-07-09 18:29:34.789939 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(3994):  waiting for 
> max_interval 5.000000
> 2020-07-09 18:29:38.651689 7f3b680afd80 20 list_by_hash_bitwise prefix DA41
> 2020-07-09 18:29:39.790069 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(4010): woke after 5.000128
> 2020-07-09 18:29:39.790090 7f3b51a87700 10 journal commit_start 
> max_applied_seq 53085082, open_ops 0
> 2020-07-09 18:29:39.790092 7f3b51a87700 10 journal commit_start blocked, all 
> open_ops have completed
> 2020-07-09 18:29:39.790093 7f3b51a87700 10 journal commit_start nothing to do
> 2020-07-09 18:29:39.790102 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(3994):  waiting for 
> max_interval 5.000000
> 2020-07-09 18:29:44.790200 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(4010): woke after 5.000095
> 2020-07-09 18:29:44.790256 7f3b51a87700 10 journal commit_start 
> max_applied_seq 53085082, open_ops 0
> 2020-07-09 18:29:44.790265 7f3b51a87700 10 journal commit_start blocked, all 
> open_ops have completed
> 2020-07-09 18:29:44.790268 7f3b51a87700 10 journal commit_start nothing to do
> 2020-07-09 18:29:44.790286 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(3994):  waiting for 
> max_interval 5.000000
> 2020-07-09 18:29:49.790353 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(4010): woke after 5.000066
> 2020-07-09 18:29:49.790374 7f3b51a87700 10 journal commit_start 
> max_applied_seq 53085082, open_ops 0
> 2020-07-09 18:29:49.790376 7f3b51a87700 10 journal commit_start blocked, all 
> open_ops have completed
> 2020-07-09 18:29:49.790378 7f3b51a87700 10 journal commit_start nothing to do
> 2020-07-09 18:29:49.790387 7f3b51a87700 20 
> filestore(/var/lib/ceph/osd/ceph-1) sync_entry(3994):  waiting for 
> max_interval 5.000000
> 2020-07-09 18:29:50.564479 7f3b680afd80 20 list_by_hash_bitwise prefix 
> DA410000
> 2020-07-09 18:29:50.564501 7f3b680afd80 20 list_by_hash_bitwise prefix 
> DA410000 ob 3#8:b5280000::::head#
> 2020-07-09 18:29:50.564508 7f3b680afd80 20 list_by_hash_bitwise prefix 
> DA41002A
>
> Any idea what's going on here? I can run a find of every file on the 
> filesystem in under 12 minutes so I'm not sure what's taking so long.
>
> _______________________________________________
> ceph-users mailing list -- [email protected] To unsubscribe send an email to 
> [email protected] _______________________________________________
> ceph-users mailing list -- [email protected] To unsubscribe send an email to 
> [email protected] _______________________________________________
> ceph-users mailing list -- [email protected] To unsubscribe send an email to 
> [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to