On Mon, Feb 8, 2021 at 3:11 PM Goffredo Baroncelli <kreij...@libero.it> wrote: > > On 2/7/21 11:06 PM, Chris Murphy wrote: > > systemd-journald journals on Btrfs default to nodatacow, upon log > > rotation it's submitted for defragmenting with BTRFS_IOC_DEFRAG. The > > result looks curious. I can't tell what the logic is from the results. > > > > The journal file starts out being fallocated with a size of 8MB, and > > as it grows there is an append of 8MB increments, also fallocated. > > This leads to a filefrag -v that looks like this (ext4 and btrfs > > nodatacow follow the same behavior, both are provided for reference): > > > > ext4 > > https://pastebin.com/6vuufwXt > > > > btrfs > > https://pastebin.com/Y18B2m4h > > > > Following defragment with BTRFS_IOC_DEFRAG it looks like this: > > https://pastebin.com/1ufErVMs > > > > It appears at first glance to be significantly more fragmented. Closer > > inspection shows that most of the extents weren't relocated. But > > what's up with the peculiar interleaving? Is this an improvement over > > the original allocation? > > I am not sure how read the filefrag output: I see several lines like > [...] > 5: 1691.. 1693: 125477.. 125479: 3: > 6: 1694.. 1694: 125480.. 125480: 1: > unwritten > [...] > > What means "unwritten" ? The kernel documentation [*] says:
My understanding is it's an exent that's been fallocated but not yet written to. What I don't know is whether they are possibly tripping up BTRFS_IOC_DEFRAG. I'm not skilled enough to create a bunch of these journal logs quickly (I'd have to just let a system run and age its own journals, which sucks, it takes forever) and then a small program that runs the same file through BTRFS_IOC_DEFRAG twice to see if it's idempotent. The resulting file after one submission does not have unwritten extents. Another thing I'm not sure of is whether ssd vs nossd affects the defrag results. Or datacow versus nodatacow. Another thing I'm not sure of is if autodefrag is a better solution to the problem. Whereby it acts as a no op when the file is nodatacow, and does the expected thing if it's datacow. But then we'd need an autodefrag xattr to set on the enclosing directory for these journals because there's no reliable way to set autodefrag mount option globally, not knowing all the work loads. It can make some workloads worse. > My educate guess is that there is something strange in the sequence: > - write > - sync > - close log > - move log > - defrag log > > May be the defrag starts before all the data reach the platters ? Perhaps. Attach strace to journald before --rotate, and then --rotate https://pastebin.com/UGihfCG9 > > For what matters, I create a file with the same fragmentation like your one > > $ sudo filefrag -v data.txt > Filesystem type is: 9123683e > File size of data.txt is 25165824 (6144 blocks of 4096 bytes) > ext: logical_offset: physical_offset: length: expected: flags: > 0: 0.. 0: 1597171.. 1597171: 1: > 1: 1.. 1599: 163433285.. 163434883: 1599: 1597172: > 2: 1600.. 1607: 1601255.. 1601262: 8: 163434884: > 3: 1608.. 1689: 1604137.. 1604218: 82: 1601263: > 4: 1690.. 1690: 1597484.. 1597484: 1: 1604219: > 5: 1691.. 1693: 1597465.. 1597467: 3: 1597485: > 6: 1694.. 1694: 1597966.. 1597966: 1: 1597468: > 7: 1695.. 1722: 1599557.. 1599584: 28: 1597967: > 8: 1723.. 1723: 1599211.. 1599211: 1: 1599585: > 9: 1724.. 1955: 1648394.. 1648625: 232: 1599212: > 10: 1956.. 1956: 1599695.. 1599695: 1: 1648626: > 11: 1957.. 2047: 1625881.. 1625971: 91: 1599696: > 12: 2048.. 2417: 1648804.. 1649173: 370: 1625972: > 13: 2418.. 2420: 1597468.. 1597470: 3: 1649174: > 14: 2421.. 2478: 1624667.. 1624724: 58: 1597471: > 15: 2479.. 2479: 1596416.. 1596416: 1: 1624725: > 16: 2480.. 2482: 1601045.. 1601047: 3: 1596417: > 17: 2483.. 2483: 1596854.. 1596854: 1: 1601048: > 18: 2484.. 2523: 1602715.. 1602754: 40: 1596855: > 19: 2524.. 2527: 1597471.. 1597474: 4: 1602755: > 20: 2528.. 2598: 1624725.. 1624795: 71: 1597475: > 21: 2599.. 2599: 1596858.. 1596858: 1: 1624796: > 22: 2600.. 2607: 1601263.. 1601270: 8: 1596859: > 23: 2608.. 2608: 1596863.. 1596863: 1: 1601271: > 24: 2609.. 2611: 1601271.. 1601273: 3: 1596864: > 25: 2612.. 2612: 1596864.. 1596864: 1: 1601274: > 26: 2613.. 2615: 1601274.. 1601276: 3: 1596865: > 27: 2616.. 2616: 1596981.. 1596981: 1: 1601277: > 28: 2617.. 2691: 1649174.. 1649248: 75: 1596982: > 29: 2692.. 2696: 1597475.. 1597479: 5: 1649249: > 30: 2697.. 2756: 1634995.. 1635054: 60: 1597480: > 31: 2757.. 2758: 1597480.. 1597481: 2: 1635055: > 32: 2759.. 2762: 1601351.. 1601354: 4: 1597482: > 33: 2763.. 2764: 1597482.. 1597483: 2: 1601355: > 34: 2765.. 2837: 1649249.. 1649321: 73: 1597484: > 35: 2838.. 2838: 1597038.. 1597038: 1: 1649322: > 36: 2839.. 2855: 1601538.. 1601554: 17: 1597039: > 37: 2856.. 2856: 1597045.. 1597045: 1: 1601555: > 38: 2857.. 2904: 1624547.. 1624594: 48: 1597046: > 39: 2905.. 2926: 1600795.. 1600816: 22: 1624595: > 40: 2927.. 2942: 1602034.. 1602049: 16: 1600817: > 41: 2943.. 2963: 1600817.. 1600837: 21: 1602050: > 42: 2964.. 2979: 1602183.. 1602198: 16: 1600838: > 43: 2980.. 3001: 1600927.. 1600948: 22: 1602199: > 44: 3002.. 3043: 1621164.. 1621205: 42: 1600949: > 45: 3044.. 3053: 1599231.. 1599240: 10: 1621206: > 46: 3054.. 3066: 1601952.. 1601964: 13: 1599241: > 47: 3067.. 3067: 1597056.. 1597056: 1: 1601965: > 48: 3068.. 3084: 1602375.. 1602391: 17: 1597057: > 49: 3085.. 3094: 1599290.. 1599299: 10: 1602392: > 50: 3095.. 3096: 1601355.. 1601356: 2: 1599300: > 51: 3097.. 3107: 1600717.. 1600727: 11: 1601357: > 52: 3108.. 3156: 1642892.. 1642940: 49: 1600728: > 53: 3157.. 3157: 1597059.. 1597059: 1: 1642941: > 54: 3158.. 3251: 1649322.. 1649415: 94: 1597060: > 55: 3252.. 3254: 1599241.. 1599243: 3: 1649416: > 56: 3255.. 3304: 1645466.. 1645515: 50: 1599244: > 57: 3305.. 3305: 1597100.. 1597100: 1: 1645516: > 58: 3306.. 3312: 1601357.. 1601363: 7: 1597101: > 59: 3313.. 3319: 1599300.. 1599306: 7: 1601364: > 60: 3320.. 3331: 1601611.. 1601622: 12: 1599307: > 61: 3332.. 3339: 1600838.. 1600845: 8: 1601623: > 62: 3340.. 3343: 1601419.. 1601422: 4: 1600846: > 63: 3344.. 3351: 1600846.. 1600853: 8: 1601423: > 64: 3352.. 3432: 1649416.. 1649496: 81: 1600854: > 65: 3433.. 3433: 1597109.. 1597109: 1: 1649497: > 66: 3434.. 3489: 1649497.. 1649552: 56: 1597110: > 67: 3490.. 3491: 1599227.. 1599228: 2: 1649553: > 68: 3492.. 3521: 1619348.. 1619377: 30: 1599229: > 69: 3522.. 3523: 1599307.. 1599308: 2: 1619378: > 70: 3524.. 3530: 1601688.. 1601694: 7: 1599309: > 71: 3531.. 3539: 1600949.. 1600957: 9: 1601695: > 72: 3540.. 3579: 1629356.. 1629395: 40: 1600958: > 73: 3580.. 3580: 1597124.. 1597124: 1: 1629396: > 74: 3581.. 3601: 1604219.. 1604239: 21: 1597125: > 75: 3602.. 3603: 1599585.. 1599586: 2: 1604240: > 76: 3604.. 3614: 1602636.. 1602646: 11: 1599587: > 77: 3615.. 3616: 1599587.. 1599588: 2: 1602647: > 78: 3617.. 3677: 1649553.. 1649613: 61: 1599589: > 79: 3678.. 3680: 1599692.. 1599694: 3: 1649614: > 80: 3681.. 3723: 1647818.. 1647860: 43: 1599695: > 81: 3724.. 3726: 1599821.. 1599823: 3: 1647861: > 82: 3727.. 3756: 1622218.. 1622247: 30: 1599824: > 83: 3757.. 3759: 1600630.. 1600632: 3: 1622248: > 84: 3760.. 3766: 1603288.. 1603294: 7: 1600633: > 85: 3767.. 3768: 1600633.. 1600634: 2: 1603295: > 86: 3769.. 3950: 76053306.. 76053487: 182: 1600635: > 87: 3951.. 3958: 1600958.. 1600965: 8: 76053488: > 88: 3959.. 3986: 1619921.. 1619948: 28: 1600966: > 89: 3987.. 3995: 1600966.. 1600974: 9: 1619949: > 90: 3996.. 4036: 1649614.. 1649654: 41: 1600975: > 91: 4037.. 4045: 1600975.. 1600983: 9: 1649655: > 92: 4046.. 4050: 1601423.. 1601427: 5: 1600984: > 93: 4051.. 4052: 1600854.. 1600855: 2: 1601428: > 94: 4053.. 4055: 1601555.. 1601557: 3: 1600856: > 95: 4056.. 4056: 1597129.. 1597129: 1: 1601558: > 96: 4057.. 4059: 1601745.. 1601747: 3: 1597130: > 97: 4060.. 4060: 1597134.. 1597134: 1: 1601748: > 98: 4061.. 4063: 1602050.. 1602052: 3: 1597135: > 99: 4064.. 4064: 1597137.. 1597137: 1: 1602053: > 100: 4065.. 4079: 1604297.. 1604311: 15: 1597138: > 101: 4080.. 4088: 1600987.. 1600995: 9: 1604312: > 102: 4089.. 4095: 1603295.. 1603301: 7: 1600996: > 103: 4096.. 4106: 1600996.. 1601006: 11: 1603302: > 104: 4107.. 4117: 1622600.. 1622610: 11: 1601007: > 105: 4118.. 4119: 1601007.. 1601008: 2: 1622611: > 106: 4120.. 4129: 1622611.. 1622620: 10: 1601009: > 107: 4130.. 4131: 1601009.. 1601010: 2: 1622621: > 108: 4132.. 4141: 1622621.. 1622630: 10: 1601011: > 109: 4142.. 4145: 1601011.. 1601014: 4: 1622631: > 110: 4146.. 4155: 1622986.. 1622995: 10: 1601015: > 111: 4156.. 4157: 1601015.. 1601016: 2: 1622996: > 112: 4158.. 4168: 1622996.. 1623006: 11: 1601017: > 113: 4169.. 4170: 1601017.. 1601018: 2: 1623007: > 114: 4171.. 4180: 1623007.. 1623016: 10: 1601019: > 115: 4181.. 4182: 1601019.. 1601020: 2: 1623017: > 116: 4183.. 4192: 1624473.. 1624482: 10: 1601021: > 117: 4193.. 4195: 1601021.. 1601023: 3: 1624483: > 118: 4196.. 4205: 1624796.. 1624805: 10: 1601024: > 119: 4206.. 4207: 1601024.. 1601025: 2: 1624806: > 120: 4208.. 4217: 1624806.. 1624815: 10: 1601026: > 121: 4218.. 4220: 1601026.. 1601028: 3: 1624816: > 122: 4221.. 4230: 1625972.. 1625981: 10: 1601029: > 123: 4231.. 4408: 1648626.. 1648803: 178: 1625982: > 124: 4409.. 4411: 1602199.. 1602201: 3: 1648804: > 125: 4412.. 4434: 1601328.. 1601350: 23: 1602202: > 126: 4435.. 4437: 1602647.. 1602649: 3: 1601351: > 127: 4438.. 4439: 1601029.. 1601030: 2: 1602650: > 128: 4440.. 4442: 1602755.. 1602757: 3: 1601031: > 129: 4443.. 4480: 1601650.. 1601687: 38: 1602758: > 130: 4481.. 4491: 1629530.. 1629540: 11: 1601688: > 131: 4492.. 4560: 1624404.. 1624472: 69: 1629541: > 132: 4561.. 4571: 1629541.. 1629551: 11: 1624473: > 133: 4572.. 4582: 1601031.. 1601041: 11: 1629552: > 134: 4583.. 4586: 1603302.. 1603305: 4: 1601042: > 135: 4587.. 4620: 1602537.. 1602570: 34: 1603306: > 136: 4621.. 4631: 1629716.. 1629726: 11: 1602571: > 137: 4632.. 4634: 1601042.. 1601044: 3: 1629727: > 138: 4635.. 6143: 156004864.. 156006372: 1509: 1601045: > last,eof > data.txt: 139 extents found > > the I tried to defrag it > > $ btrfs fi defra data.txt > $ sudo filefrag -v data.txt > Filesystem type is: 9123683e > File size of data.txt is 25165824 (6144 blocks of 4096 bytes) > ext: logical_offset: physical_offset: length: expected: flags: > 0: 0.. 6143: 164002967.. 164009110: 6144: > last,eof > data.txt: 1 extent found > > So it seems that the defrag works I get different results between BTRFS_IOC_DEFRAG which is what systemd-journald uses, and BTRFS_IOC_DEFRAG_RANGE which is what 'btrfs fi defrag' is using with a default len of 32M. Another question about BTRFS_IOC_DEFRAG is if it's intended to be minimalist? Does it have a way to estimate fragmentation and just not do anything? Because the journald nodatacow journals are not meaningfully fragmented. They are the same on ext4 and on Btrfs - it's (so far) always 8MB extents, directly related to each fallocate grow of the journal file. This kind of faux-fragmentation I think is minor even on a HDD because it's the same on ext4 and XFS and no one complains there (as far as I'm aware). -- Chris Murphy