from:"Martin Steigerwald"

Re: [PATCH RESEND 0/8] btrfs-progs: sub: Relax the privileges of "subvolume list/show"

2018-11-27 Thread Martin Steigerwald

Misono Tomohiro - 27.11.18, 06:24:
> Importantly, in order to make output consistent for both root and
> non-privileged user, this changes the behavior of "subvolume list":
>  - (default) Only list in subvolume under the specified path.
>Path needs to be a subvolume.

Does that work recursively?

I wound find it quite unexpected if I did btrfs subvol list in or on the 
root directory of a BTRFS filesystem would not display any subvolumes on 
that filesystem no matter where they are.

Thanks,
-- 
Martin

Re: Interpreting `btrfs filesystem show'

2018-10-15 Thread Martin Steigerwald

Hugo Mills - 15.10.18, 16:26:
> On Mon, Oct 15, 2018 at 05:24:08PM +0300, Anton Shepelev wrote:
> > Hello, all
> > 
> > While trying to resolve free space problems, and found that
> > 
> > I cannot interpret the output of:
> > > btrfs filesystem show
> > 
> > Label: none  uuid: 8971ce5b-71d9-4e46-ab25-ca37485784c8
> > Total devices 1 FS bytes used 34.06GiB
> > devid1 size 40.00GiB used 37.82GiB path /dev/sda2
> > 
> > How come the total used value is less than the value listed
> > for the only device?
> 
>"Used" on the device is the mount of space allocated. "Used" on the
> FS is the total amount of actual data and metadata in that
> allocation.
> 
>You will also need to look at the output of "btrfs fi df" to see
> the breakdown of the 37.82 GiB into data, metadata and currently
> unused.

I usually use btrfs fi usage -T, cause

1. It has all the information.

2. It differentiates between used and allocated.

% btrfs fi usage -T /
Overall:
Device size: 100.00GiB
Device allocated: 54.06GiB
Device unallocated:   45.94GiB
Device missing:  0.00B
Used: 46.24GiB
Free (estimated): 25.58GiB  (min: 25.58GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:   70.91MiB  (used: 0.00B)

Data Metadata  System  
Id Path RAID1RAID1 RAID1Unallocated
--   -  ---
 2 /dev/mapper/msata-debian 25.00GiB   2.00GiB 32.00MiB22.97GiB
 1 /dev/mapper/sata-debian  25.00GiB   2.00GiB 32.00MiB22.97GiB
--   -  ---
   Total25.00GiB   2.00GiB 32.00MiB45.94GiB
   Used 22.38GiB 754.66MiB 16.00KiB  


For RAID it in some place reports the raw size and sometimes the logical 
size. Especially in the "Total" line I find this a bit inconsistent. 
"RAID1" columns show logical size, "Unallocated" shows raw size.

Also "Used:" in the global section shows raw size and "Free 
(estimated):" shows logical size.

Thanks
-- 
Martin

Re: BTRFS related kernel backtrace on boot on 4.18.7 after blackout due to discharged battery

2018-10-05 Thread Martin Steigerwald

Filipe Manana - 05.10.18, 17:21:
> On Fri, Oct 5, 2018 at 3:23 PM Martin Steigerwald 
 wrote:
> > Hello!
> > 
> > On ThinkPad T520 after battery was discharged and machine just
> > blacked out.
> > 
> > Is that some sign of regular consistency check / replay or something
> > to investigate further?
> 
> I think it's harmless, if anything were messed up with link counts or
> mismatches between those and dir entries, fsck (btrfs check) should
> have reported something.
> I'll dig a big further and remove the warning if it's really harmless.

I just scrubbed the filesystem. I did not run btrfs check on it.

> > I already scrubbed all data and there are no errors. Also btrfs
> > device stats reports no errors. SMART status appears to be okay as
> > well on both SSD.
> > 
> > [4.524355] BTRFS info (device dm-4): disk space caching is
> > enabled [… backtrace …]
-- 
Martin

BTRFS related kernel backtrace on boot on 4.18.7 after blackout due to discharged battery

2018-10-05 Thread Martin Steigerwald

Hello!

On ThinkPad T520 after battery was discharged and machine just blacked
out.

Is that some sign of regular consistency check / replay or something to
investigate further?

I already scrubbed all data and there are no errors. Also btrfs device stats
reports no errors. SMART status appears to be okay as well on both SSD.

[4.524355] BTRFS info (device dm-4): disk space caching is enabled
[4.524356] BTRFS info (device dm-4): has skinny extents
[4.563950] BTRFS info (device dm-4): enabling ssd optimizations
[5.463085] Console: switching to colour frame buffer device 240x67
[5.492236] i915 :00:02.0: fb0: inteldrmfb frame buffer device
[5.882661] BTRFS info (device dm-3): disk space caching is enabled
[5.882664] BTRFS info (device dm-3): has skinny extents
[5.918579] SGI XFS with ACLs, security attributes, realtime, scrub, no 
debug enabled
[5.927421] Adding 20971516k swap on /dev/mapper/sata-swap.  Priority:-2 
extents:1 across:20971516k SSDsc
[5.935051] XFS (sdb1): Mounting V5 Filesystem
[5.935218] XFS (sda1): Mounting V5 Filesystem
[5.961100] XFS (sda1): Ending clean mount
[5.970857] BTRFS info (device dm-3): enabling ssd optimizations
[5.972358] XFS (sdb1): Ending clean mount
[5.975955] WARNING: CPU: 1 PID: 1104 at fs/inode.c:342 inc_nlink+0x28/0x30
[5.978271] Modules linked in: xfs msr pktcdvd intel_rapl 
x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi pcbc arc4 
snd_hda_codec_conexant snd_hda_codec_generic iwldvm mac80211 iwlwifi 
aesni_intel snd_hda_intel snd_hda_codec aes_x86_64 crypto_simd cryptd 
snd_hda_core glue_helper intel_cstate snd_hwdep intel_rapl_perf snd_pcm pcspkr 
input_leds i915 sg cfg80211 snd_timer thinkpad_acpi nvram drm_kms_helper snd 
soundcore tpm_tis tpm_tis_core drm rfkill ac tpm i2c_algo_bit fb_sys_fops 
battery rng_core video syscopyarea sysfillrect sysimgblt button evdev sbs sbshc 
coretemp bfq hdaps(O) tp_smapi(O) thinkpad_ec(O) loop ecryptfs cbc sunrpc 
mcryptd sha256_ssse3 sha256_generic encrypted_keys ip_tables x_tables autofs4 
dm_mod
[5.990499]  btrfs xor zstd_decompress zstd_compress xxhash zlib_deflate 
raid6_pq libcrc32c crc32c_generic sr_mod cdrom sd_mod hid_lenovo hid_generic 
usbhid hid ahci libahci libata ehci_pci crc32c_intel psmouse i2c_i801 sdhci_pci 
cqhci lpc_ich sdhci ehci_hcd e1000e scsi_mod i2c_core mfd_core mmc_core usbcore 
usb_common thermal
[5.990529] CPU: 1 PID: 1104 Comm: mount Tainted: G   O  
4.18.7-tp520 #63
[5.990532] Hardware name: LENOVO 42433WG/42433WG, BIOS 8AET69WW (1.49 ) 
06/14/2018
[6.000153] RIP: 0010:inc_nlink+0x28/0x30
[6.000154] Code: 00 00 8b 47 48 85 c0 74 07 83 c0 01 89 47 48 c3 f6 87 a1 
00 00 00 04 74 11 48 8b 47 28 f0 48 ff 88 98 04 00 00 8b 47 48 eb df <0f> 0b eb 
eb 0f 1f 40 00 41 54 8b 0d 70 3f aa 00 48 ba eb 83 b5 80 
[6.008573] RSP: 0018:c90002283828 EFLAGS: 00010246
[6.008575] RAX:  RBX: 8804018bed58 RCX: 00022261
[6.008576] RDX: 00022251 RSI:  RDI: 8804018bed58
[6.008577] RBP: c90002283a50 R08: 0002a330 R09: a02f3873
[6.008578] R10: 30ff R11: 7763 R12: 0011
[6.008579] R13: 3d5f R14: 880403e19800 R15: 88040a3c69a0
[6.008580] FS:  7f071598f100() GS:88041e24() 
knlGS:
[6.008581] CS:  0010 DS:  ES:  CR0: 80050033
[6.008589] CR2: 7fda4fbf8218 CR3: 000403e42001 CR4: 000606e0
[6.008590] Call Trace:
[6.008614]  replay_one_buffer+0x80e/0x890 [btrfs]
[6.008632]  walk_up_log_tree+0x1dc/0x260 [btrfs]
[6.046858]  walk_log_tree+0xaf/0x1e0 [btrfs]
[6.046872]  btrfs_recover_log_trees+0x21c/0x410 [btrfs]
[6.046885]  ? btree_read_extent_buffer_pages+0xcd/0x210 [btrfs]
[6.055941]  ? fixup_inode_link_counts+0x170/0x170 [btrfs]
[6.055953]  open_ctree+0x1a0d/0x1b60 [btrfs]
[6.055965]  btrfs_mount_root+0x67b/0x760 [btrfs]
[6.065039]  ? pcpu_alloc_area+0xdd/0x120
[6.065040]  ? pcpu_next_unpop+0x32/0x40
[6.065052]  mount_fs+0x36/0x162
[6.065055]  vfs_kern_mount.part.34+0x4f/0x120
[6.065064]  btrfs_mount+0x15f/0x890 [btrfs]
[6.065067]  ? pcpu_cnt_pop_pages+0x40/0x50
[6.065069]  ? pcpu_alloc_area+0xdd/0x120
[6.065071]  ? pcpu_next_unpop+0x32/0x40
[6.065073]  ? cpumask_next+0x16/0x20
[6.065075]  ? pcpu_alloc+0x1c3/0x690
[6.065078]  ? mount_fs+0x36/0x162
[6.099840]  mount_fs+0x36/0x162
[6.099843]  vfs_kern_mount.part.34+0x4f/0x120
[6.099845]  do_mount+0x1f7/0xc80
[6.099856]  ksys_mount+0xb5/0xd0
[6.099859]  __x64_sys_mount+0x1c/0x20
[6.099861]  do_syscall_64+0x43/0xd0
[6.099864]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[6.099866] RIP: 0033:0x7f0715b89a1a
[6.099867] Code: 48 8b 0d 71 e4 0b 00 f7 d8 64 89 01 48 83 c8

Re: very poor performance / a lot of writes to disk with space_cache (but not with space_cache=v2)

2018-09-19 Thread Martin Steigerwald

Hans van Kranenburg - 19.09.18, 19:58:
> However, as soon as we remount the filesystem with space_cache=v2 -
> 
> > writes drop to just around 3-10 MB/s to each disk. If we remount to
> > space_cache - lots of writes, system unresponsive. Again remount to
> > space_cache=v2 - low writes, system responsive.
> > 
> > That's a huuge, 10x overhead! Is it expected? Especially that
> > space_cache=v1 is still the default mount option?
> 
> Yes, that does not surprise me.
> 
> https://events.static.linuxfound.org/sites/events/files/slides/vault20
> 16_0.pdf
> 
> Free space cache v1 is the default because of issues with btrfs-progs,
> not because it's unwise to use the kernel code. I can totally
> recommend using it. The linked presentation above gives some good
> background information.

What issues in btrfs-progs are that?

I am wondering whether to switch to freespace tree v2. Would it provide 
benefit for a regular / and /home filesystems as dual SSD BTRFS RAID-1 
on a laptop?

Thanks,
-- 
Martin

Re: lazytime mount option—no support in Btrfs

2018-08-19 Thread Martin Steigerwald

waxhead - 18.08.18, 22:45:
> Adam Hunt wrote:
> > Back in 2014 Ted Tso introduced the lazytime mount option for ext4
> > and shortly thereafter a more generic VFS implementation which was
> > then merged into mainline. His early patches included support for
> > Btrfs but those changes were removed prior to the feature being
> > merged. His> 
> > changelog includes the following note about the removal:
> >- Per Christoph's suggestion, drop support for btrfs and xfs for
> >now,
> >
> >  issues with how btrfs and xfs handle dirty inode tracking.  We
> >  can add btrfs and xfs support back later or at the end of this
> >  series if we want to revisit this decision.
> > 
> > My reading of the current mainline shows that Btrfs still lacks any
> > support for lazytime. Has any thought been given to adding support
> > for lazytime to Btrfs?
[…]
> Is there any new regarding this?

I´d like to know whether there is any news about this as well.

If I understand it correctly this could even help BTRFS performance a 
lot cause it is COW´ing metadata.

Thanks,
-- 
Martin

Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD

2018-08-18 Thread Martin Steigerwald

Roman Mamedov - 18.08.18, 09:12:
> On Fri, 17 Aug 2018 23:17:33 +0200
> 
> Martin Steigerwald  wrote:
> > > Do not consider SSD "compression" as a factor in any of your
> > > calculations or planning. Modern controllers do not do it anymore,
> > > the last ones that did are SandForce, and that's 2010 era stuff.
> > > You
> > > can check for yourself by comparing write speeds of compressible
> > > vs
> > > incompressible data, it should be the same. At most, the modern
> > > ones
> > > know to recognize a stream of binary zeroes and have a special
> > > case
> > > for that.
> > 
> > Interesting. Do you have any backup for your claim?
> 
> Just "something I read". I follow quote a bit of SSD-related articles
> and reviews which often also include a section to talk about the
> controller utilized, its background and technological
> improvements/changes -- and the compression going out of fashion
> after SandForce seems to be considered a well-known fact.
> 
> Incidentally, your old Intel 320 SSDs actually seem to be based on
> that old SandForce controller (or at least license some of that IP to
> extend on it), and hence those indeed might perform compression.

Interesting. Back then I read the Intel SSD 320 would not compress.
I think its difficult to know for sure with those proprietary controllers.

> > As the data still needs to be transferred to the SSD at least when
> > the SATA connection is maxed out I bet you won´t see any difference
> > in write speed whether the SSD compresses in real time or not.
> 
> Most controllers expose two readings in SMART:
> 
>   - Lifetime writes from host (SMART attribute 241)
>   - Lifetime writes to flash (attribute 233, or 177, or 173...)
>
> It might be difficult to get the second one, as often it needs to be
> decoded from others such as "Average block erase count" or "Wear
> leveling count". (And seems to be impossible on Samsung NVMe ones,
> for example)

I got the impression every manufacturer does their own thing here. And I
would not even be surprised when its different between different generations
of SSDs by one manufacturer.

# Crucial mSATA

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f   100   100   000Pre-fail  Always   
-   0
  5 Reallocated_Sector_Ct   0x0033   100   100   000Pre-fail  Always   
-   0
  9 Power_On_Hours  0x0032   100   100   000Old_age   Always   
-   16345
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   4193
171 Program_Fail_Count  0x0032   100   100   000Old_age   Always   
-   0
172 Erase_Fail_Count0x0032   100   100   000Old_age   Always   
-   0
173 Wear_Leveling_Count 0x0032   078   078   000Old_age   Always   
-   663
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000Old_age   Always   
-   362
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000Pre-fail  Always   
-   8219
183 SATA_Iface_Downshift0x0032   100   100   000Old_age   Always   
-   1
184 End-to-End_Error0x0032   100   100   000Old_age   Always   
-   0
187 Reported_Uncorrect  0x0032   100   100   000Old_age   Always   
-   0
194 Temperature_Celsius 0x0022   046   020   000Old_age   Always   
-   54 (Min/Max -10/80)
196 Reallocated_Event_Count 0x0032   100   100   000Old_age   Always   
-   16
197 Current_Pending_Sector  0x0032   100   100   000Old_age   Always   
-   0
198 Offline_Uncorrectable   0x0030   100   100   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x0032   100   100   000Old_age   Always   
-   0
202 Percent_Lifetime_Used   0x0031   078   078   000Pre-fail  Offline  
-   22

I expect the raw value of this to raise more slowly now there are almost
100 GiB completely unused and there is lots of free space in the filesystems.
But even if not, the SSD is in use since March 2014. So it has plenty of time
to go.

206 Write_Error_Rate0x000e   100   100   000Old_age   Always   
-   0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000Old_age   Always   
-   0
246 Total_Host_Sector_Write 0x0032   100   100   ---Old_age   Always   
-   91288276930

^^ In sectors. 91288276930 * 512 / 1024 / 1024 / 1024 ~= 43529 GiB

Could be 4 KiB… but as its telling about Host_Sector and the value multiplied
by eight does not make any sense, I bet its 512 Bytes.

% smartctl /dev/sdb --all |grep "

Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD

2018-08-17 Thread Martin Steigerwald

Austin S. Hemmelgarn - 17.08.18, 14:55:
> On 2018-08-17 08:28, Martin Steigerwald wrote:
> > Thanks for your detailed answer.
> > 
> > Austin S. Hemmelgarn - 17.08.18, 13:58:
> >> On 2018-08-17 05:08, Martin Steigerwald wrote:
[…]
> >>> Anyway, creating a new filesystem may have been better here
> >>> anyway,
> >>> cause it replaced an BTRFS that aged over several years with a new
> >>> one. Due to the increased capacity and due to me thinking that
> >>> Samsung 860 Pro compresses itself, I removed LZO compression. This
> >>> would also give larger extents on files that are not fragmented or
> >>> only slightly fragmented. I think that Intel SSD 320 did not
> >>> compress, but Crucial m500 mSATA SSD does. That has been the
> >>> secondary SSD that still had all the data after the outage of the
> >>> Intel SSD 320.
> >> 
> >> First off, keep in mind that the SSD firmware doing compression
> >> only
> >> really helps with wear-leveling.  Doing it in the filesystem will
> >> help not only with that, but will also give you more space to work
> >> with.> 
> > While also reducing the ability of the SSD to wear-level. The more
> > data I fit on the SSD, the less it can wear-level. And the better I
> > compress that data, the less it can wear-level.
> 
> No, the better you compress the data, the _less_ data you are
> physically putting on the SSD, just like compressing a file makes it
> take up less space.  This actually makes it easier for the firmware
> to do wear-leveling.  Wear-leveling is entirely about picking where
> to put data, and by reducing the total amount of data you are writing
> to the SSD, you're making that decision easier for the firmware, and
> also reducing the number of blocks of flash memory needed (which also
> helps with SSD life expectancy because it translates to fewer erase
> cycles).

On one hand I can go with this, but:

If I fill the SSD 99% with already compressed data, in case it 
compresses itself for wear leveling, it has less chance to wear level 
than with 99% of not yet compressed data that it could compress itself.

That was the point I was trying to make.

Sure, with a fill rate of about 46% for home, compression would help the 
wear leveling. And if the controller does not compress at all, it would 
also.

Hmmm, maybe I enable "zstd", but on the other hand I save CPU cycles 
with not enabling it. 

> > However… I am not all that convinced that it would benefit me as
> > long as I have enough space. That SSD replacement more than doubled
> > capacity from about 680 TB to 1480 TB. I have ton of free space in
> > the filesystems – usage of /home is only 46% for example – and
> > there are 96 GiB completely unused in LVM on the Crucial SSD and
> > even more than 183 GiB completely unused on Samsung SSD. The system
> > is doing weekly "fstrim" on all filesystems. I think that this is
> > more than is needed for the longevity of the SSDs, but well
> > actually I just don´t need the space, so…
> > 
> > Of course, in case I manage to fill up all that space, I consider
> > using compression. Until then, I am not all that convinced that I´d
> > benefit from it.
> > 
> > Of course it may increase read speeds and in case of nicely
> > compressible data also write speeds, I am not sure whether it even
> > matters. Also it uses up some CPU cycles on a dual core (+
> > hyperthreading) Sandybridge mobile i5. While I am not sure about
> > it, I bet also having larger possible extent sizes may help a bit.
> > As well as no compression may also help a bit with fragmentation.
> 
> It generally does actually. Less data physically on the device means
> lower chances of fragmentation.  In your case, it may not improve

I thought "no compression" may help with fragmentation, but I think you 
think that "compression" helps with fragmentation and misunderstood what 
I wrote.

> speed much though (your i5 _probably_ can't compress data much faster
> than it can access your SSD's, which means you likely won't see much
> performance benefit other than reducing fragmentation).
> 
> > Well putting this to a (non-scientific) test:
> > 
> > […]/.local/share/akonadi/db_data/akonadi> du -sh * | sort -rh | head
> > -5 3,1Gparttable.ibd
> > 
> > […]/.local/share/akonadi/db_data/akonadi> filefrag parttable.ibd
> > parttable.ibd: 11583 extents found
> > 
> > Hmmm, already quite many extents after just about one week with the
> > new filesystem. On the old filesystem I had somewhat around
> > 4-5 ex

Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD

2018-08-17 Thread Martin Steigerwald

Hi Roman.

Now with proper CC.

Roman Mamedov - 17.08.18, 14:50:
> On Fri, 17 Aug 2018 14:28:25 +0200
> 
> Martin Steigerwald  wrote:
> > > First off, keep in mind that the SSD firmware doing compression
> > > only
> > > really helps with wear-leveling.  Doing it in the filesystem will
> > > help not only with that, but will also give you more space to
> > > work with.> 
> > While also reducing the ability of the SSD to wear-level. The more
> > data I fit on the SSD, the less it can wear-level. And the better I
> > compress that data, the less it can wear-level.
> 
> Do not consider SSD "compression" as a factor in any of your
> calculations or planning. Modern controllers do not do it anymore,
> the last ones that did are SandForce, and that's 2010 era stuff. You
> can check for yourself by comparing write speeds of compressible vs
> incompressible data, it should be the same. At most, the modern ones
> know to recognize a stream of binary zeroes and have a special case
> for that.

Interesting. Do you have any backup for your claim?

> As for general comment on this thread, always try to save the exact
> messages you get when troubleshooting or getting failures from your
> system. Saying just "was not able to add" or "btrfs replace not
> working" without any exact details isn't really helpful as a bug
> report or even as a general "experiences" story, as we don't know
> what was the exact cause of those, could that have been avoided or
> worked around, not to mention what was your FS state at the time (as
> in "btrfs fi show" and "fi df").

I had a screen.log, but I put it on the filesystem after the 
backup was made, so it was lost.

Anyway, the reason for not being able to add the device was the read 
only state of the BTRFS, as I wrote. Same goes for replace. I was able 
to read the error message just fine. AFAIR the exact wording was "read 
only filesystem".

In any case: It was a experience report, no request for help, so I don´t 
see why exact error messages are absolutely needed. If I had a support 
inquiry that would be different, I agree.

Thanks,
-- 
Martin

Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD

2018-08-17 Thread Martin Steigerwald

Austin S. Hemmelgarn - 17.08.18, 15:01:
> On 2018-08-17 08:50, Roman Mamedov wrote:
> > On Fri, 17 Aug 2018 14:28:25 +0200
> > 
> > Martin Steigerwald  wrote:
> >>> First off, keep in mind that the SSD firmware doing compression
> >>> only
> >>> really helps with wear-leveling.  Doing it in the filesystem will
> >>> help not only with that, but will also give you more space to
> >>> work with.>> 
> >> While also reducing the ability of the SSD to wear-level. The more
> >> data I fit on the SSD, the less it can wear-level. And the better
> >> I compress that data, the less it can wear-level.
> > 
> > Do not consider SSD "compression" as a factor in any of your
> > calculations or planning. Modern controllers do not do it anymore,
> > the last ones that did are SandForce, and that's 2010 era stuff.
> > You can check for yourself by comparing write speeds of
> > compressible vs incompressible data, it should be the same. At
> > most, the modern ones know to recognize a stream of binary zeroes
> > and have a special case for that.
> 
> All that testing write speeds forz compressible versus incompressible
> data tells you is if the SSD is doing real-time compression of data,
> not if they are doing any compression at all..  Also, this test only
> works if you turn the write-cache on the device off.

As the data still needs to be transferred to the SSD at least when the 
SATA connection is maxed out I bet you won´t see any difference in write 
speed whether the SSD compresses in real time or not.

> Besides, you can't prove 100% for certain that any manufacturer who
> does not sell their controller chips isn't doing this, which means
> there are a few manufacturers that may still be doing it.

Who really knows what SSD controller manufacturers are doing? I have not 
seen any Open Channel SSD stuff for laptops so far.

Thanks,
-- 
Martin

Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD

2018-08-17 Thread Martin Steigerwald

Thanks for your detailed answer.  

Austin S. Hemmelgarn - 17.08.18, 13:58:
> On 2018-08-17 05:08, Martin Steigerwald wrote:
[…]
> > I have seen a discussion about the limitation in point 2. That
> > allowing to add a device and make it into RAID 1 again might be
> > dangerous, cause of system chunk and probably other reasons. I did
> > not completely read and understand it tough.
> > 
> > So I still don´t get it, cause:
> > 
> > Either it is a RAID 1, then, one disk may fail and I still have
> > *all*
> > data. Also for the system chunk, which according to btrfs fi df /
> > btrfs fi sh was indeed RAID 1. If so, then period. Then I don´t see
> > why it would need to disallow me to make it into an RAID 1 again
> > after one device has been lost.
> > 
> > Or it is no RAID 1 and then what is the point to begin with? As I
> > was
> > able to copy of all date of the degraded mount, I´d say it was a
> > RAID 1.
> > 
> > (I know that BTRFS RAID 1 is not a regular RAID 1 anyway, but just
> > does two copies regardless of how many drives you use.)
> 
> So, what's happening here is a bit complicated.  The issue is entirely
> with older kernels that are missing a couple of specific patches, but
> it appears that not all distributions have their kernels updated to
> include those patches yet.
> 
> In short, when you have a volume consisting of _exactly_ two devices
> using raid1 profiles that is missing one device, and you mount it
> writable and degraded on such a kernel, newly created chunks will be
> single-profile chunks instead of raid1 chunks with one half missing.
> Any write has the potential to trigger allocation of a new chunk, and
> more importantly any _read_ has the potential to trigger allocation of
> a new chunk if you don't use the `noatime` mount option (because a
> read will trigger an atime update, which results in a write).
> 
> When older kernels then go and try to mount that volume a second time,
> they see that there are single-profile chunks (which can't tolerate
> _any_ device failures), and refuse to mount at all (because they
> can't guarantee that metadata is intact).  Newer kernels fix this
> part by checking per-chunk if a chunk is degraded/complete/missing,
> which avoids this because all the single chunks are on the remaining
> device.

How new the kernel needs to be for that to happen?

Do I get this right that it would be the kernel used for recovery, i.e. 
the one on the live distro that needs to be new enough? To one on this 
laptop meanwhile is already 4.18.1.

I used latest GRML stable release 2017.05 which has an 4.9 kernel.

> As far as avoiding this in the future:

I hope that with the new Samsung Pro 860 together with the existing 
Crucial m500 I am spared from this for years to come. That Crucial SSD 
according to SMART status about lifetime used has still quite some time 
to go.

> * If you're just pulling data off the device, mark the device
> read-only in the _block layer_, not the filesystem, before you mount
> it.  If you're using LVM, just mark the LV read-only using LVM
> commands  This will make 100% certain that nothing gets written to
> the device, and thus makes sure that you won't accidentally cause
> issues like this.

> * If you're going to convert to a single device,
> just do it and don't stop it part way through.  In particular, make
> sure that your system will not lose power.

> * Otherwise, don't mount the volume unless you know you're going to
> repair it.

Thanks for those. Good to keep in mind.

> > For this laptop it was not all that important but I wonder about
> > BTRFS RAID 1 in enterprise environment, cause restoring from backup
> > adds a significantly higher downtime.
> > 
> > Anyway, creating a new filesystem may have been better here anyway,
> > cause it replaced an BTRFS that aged over several years with a new
> > one. Due to the increased capacity and due to me thinking that
> > Samsung 860 Pro compresses itself, I removed LZO compression. This
> > would also give larger extents on files that are not fragmented or
> > only slightly fragmented. I think that Intel SSD 320 did not
> > compress, but Crucial m500 mSATA SSD does. That has been the
> > secondary SSD that still had all the data after the outage of the
> > Intel SSD 320.
> 
> First off, keep in mind that the SSD firmware doing compression only
> really helps with wear-leveling.  Doing it in the filesystem will help
> not only with that, but will also give you more space to work with.

While also reducing the ability of the SSD to wear-level. The more data 
I fit on the SSD, the less it can wear-level. And the better I compress 
that data, the less it can wear-level

Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD

2018-08-17 Thread Martin Steigerwald

Hi!

This happened about two weeks ago. I already dealt with it and all is 
well.

Linux hung on suspend so I switched off this ThinkPad T520 forcefully. 
After that it did not boot the operating system anymore. Intel SSD 320, 
latest firmware, which should patch this bug, but apparently does not, 
is only 8 MiB big. Those 8 MiB just contain zeros.

Access via GRML and "mount -fo degraded" worked. I initially was even 
able to write onto this degraded filesystem. First I copied all data to 
a backup drive.

I even started a balance to "single" so that it would work with one SSD.

But later I learned that secure erase may recover the Intel SSD 320 and 
since I had no other SSD at hand, did that. And yes, it did. So I 
canceled the balance.

I partitioned the Intel SSD 320 and put LVM on it, just as I had it. But 
at that time I was not able to mount the degraded BTRFS on the other SSD 
as writable anymore, not even with "-f" "I know what I am doing". Thus I 
was not able to add a device to it and btrfs balance it to RAID 1. Even 
"btrfs replace" was not working.

I thus formatted a new BTRFS RAID 1 and restored.

A week later I migrated the Intel SSD 320 to a Samsung 860 Pro. Again 
via one full backup and restore cycle. However, this time I was able to 
copy most of the data of the Intel SSD 320 with "mount -fo degraded" via 
eSATA and thus the copy operation was way faster.

So conclusion:

1. Pro: BTRFS RAID 1 really protected my data against a complete SSD 
outage.

2. Con:  It does not allow me to add a device and balance to RAID 1 or 
replace one device that is already missing at this time.

3. I keep using BTRFS RAID 1 on two SSDs for often changed, critical 
data.

4. And yes, I know it does not replace a backup. As it was holidays and 
I was lazy backup was two weeks old already, so I was happy to have all 
my data still on the other SSD.

5. The error messages in kernel when mounting without "-o degraded" are 
less than helpful. They indicate a corrupted filesystem instead of just 
telling that one device is missing and "-o degraded" would help here.


I have seen a discussion about the limitation in point 2. That allowing 
to add a device and make it into RAID 1 again might be dangerous, cause 
of system chunk and probably other reasons. I did not completely read 
and understand it tough.

So I still don´t get it, cause:

Either it is a RAID 1, then, one disk may fail and I still have *all* 
data. Also for the system chunk, which according to btrfs fi df / btrfs 
fi sh was indeed RAID 1. If so, then period. Then I don´t see why it 
would need to disallow me to make it into an RAID 1 again after one 
device has been lost.

Or it is no RAID 1 and then what is the point to begin with? As I was 
able to copy of all date of the degraded mount, I´d say it was a RAID 1.

(I know that BTRFS RAID 1 is not a regular RAID 1 anyway, but just does 
two copies regardless of how many drives you use.)


For this laptop it was not all that important but I wonder about BTRFS 
RAID 1 in enterprise environment, cause restoring from backup adds a 
significantly higher downtime.

Anyway, creating a new filesystem may have been better here anyway, 
cause it replaced an BTRFS that aged over several years with a new one. 
Due to the increased capacity and due to me thinking that Samsung 860 
Pro compresses itself, I removed LZO compression. This would also give 
larger extents on files that are not fragmented or only slightly 
fragmented. I think that Intel SSD 320 did not compress, but Crucial 
m500 mSATA SSD does. That has been the secondary SSD that still had all 
the data after the outage of the Intel SSD 320.


Overall I am happy, cause BTRFS RAID 1 gave me access to the data after 
the SSD outage. That is the most important thing about it for me.

Thanks,
-- 
Martin

Re: BTRFS and databases

2018-08-02 Thread Martin Steigerwald

Andrei Borzenkov - 02.08.18, 12:35:
> Отправлено с iPhone
> 
> > 2 авг. 2018 г., в 12:16, Martin Steigerwald 
> > написал(а):> 
> > Hugo Mills - 01.08.18, 10:56:
> >>> On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote:
> >>> I know it's a decade-old question, but I'd like to hear your
> >>> thoughts
> >>> of today. By now, I became a heavy BTRFS user. Almost everywhere I
> >>> use BTRFS, except in situations when it is obvious there is no
> >>> benefit (e.g. /var/log, /boot). At home, all my desktop, laptop
> >>> and
> >>> server computers are mainly running on BTRFS with only a few file
> >>> systems on ext4. I even installed BTRFS in corporate productive
> >>> systems (in those cases, the systems were mainly on ext4; but
> >>> there
> >>> were some specific file systems those exploited BTRFS features).
> >>> 
> >>> But there is still one question that I can't get over: if you
> >>> store
> >>> a
> >>> database (e.g. MySQL), would you prefer having a BTRFS volume
> >>> mounted
> >>> with nodatacow, or would you just simply use ext4?
> >>> 
> >>   Personally, I'd start with btrfs with autodefrag. It has some
> >> 
> >> degree of I/O overhead, but if the database isn't
> >> performance-critical and already near the limits of the hardware,
> >> it's unlikely to make much difference. Autodefrag should keep the
> >> fragmentation down to a minimum.
> > 
> > I read that autodefrag would only help with small databases.
> 
> I wonder if anyone actually
> 
> a) quantified performance impact
> b) analyzed the cause
> 
> I work with NetApp for a long time and I can say from first hand
> experience that fragmentation had zero impact on OLTP workload. It
> did affect backup performance as was expected, but this could be
> fixed by periodic reallocation (defragmentation).
> 
> And even that needed quite some time to observe (years) on pretty high
>  load database with regular backup and replication snapshots.
> 
> If btrfs is so susceptible to fragmentation, what is the reason for
> it?

In the end of my original mail I mentioned a blog article that also had 
some performance graphs. Did you actually read it?

Thanks,
-- 
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS and databases

2018-08-02 Thread Martin Steigerwald

Hugo Mills - 01.08.18, 10:56:
> On Wed, Aug 01, 2018 at 05:45:15AM +0200, MegaBrutal wrote:
> > I know it's a decade-old question, but I'd like to hear your
> > thoughts
> > of today. By now, I became a heavy BTRFS user. Almost everywhere I
> > use BTRFS, except in situations when it is obvious there is no
> > benefit (e.g. /var/log, /boot). At home, all my desktop, laptop and
> > server computers are mainly running on BTRFS with only a few file
> > systems on ext4. I even installed BTRFS in corporate productive
> > systems (in those cases, the systems were mainly on ext4; but there
> > were some specific file systems those exploited BTRFS features).
> > 
> > But there is still one question that I can't get over: if you store
> > a
> > database (e.g. MySQL), would you prefer having a BTRFS volume
> > mounted
> > with nodatacow, or would you just simply use ext4?
> 
>Personally, I'd start with btrfs with autodefrag. It has some
> degree of I/O overhead, but if the database isn't performance-critical
> and already near the limits of the hardware, it's unlikely to make
> much difference. Autodefrag should keep the fragmentation down to a
> minimum.

I read that autodefrag would only help with small databases.

I also read that even on SSDs there is a notable performance penalty. 
4.2 GiB akonadi database  for tons of mails appears to work okayish on 
dual SSD BTRFS RAID 1 here with LZO compression here. However I have no 
comparison, for example how it would run on XFS. And its fragmented 
quite a bit, example for the largest file of 3 GiB – I know this in part 
is also due to LZO compression.

[…].local/share/akonadi/db_data/akonadi> time /usr/sbin/filefrag 
parttable.ibd
parttable.ibd: 45380 extents found
/usr/sbin/filefrag parttable.ibd  0,00s user 0,86s system 41% cpu 2,054 
total

However it digs out those extents quite fast.

I would not feel comfortable with setting this file to nodatacow.

However I wonder: Is this it? Is there nothing that can be improved in 
BTRFS to handle database and VM files in a better way, without altering 
any default settings?

Is it also an issue on ZFS? ZFS does also copy on write. How does ZFS 
handle this? Can anything be learned from it? I never head people 
complain about poor database performance on ZFS, but… I don´t use it and 
I am not subscribed to any ZFS mailing lists, so they may have similar 
issues and I just do not know it.

Well there seems to be a performance penalty at least when compared to 
XFS:

About ZFS Performance
Yves Trudeau, May 15, 2018

https://www.percona.com/blog/2018/05/15/about-zfs-performance/

The article described how you can use NVMe devices as cache to mitigate 
the performance impact. That would hint that BTRFS with VFS Hot Data 
Tracking and relocating data to SSD or NVMe devices could be a way to 
set this up.

But as said I read about bad database performance even on SSDs with 
BTRFS. I do not find the original reference at the moment, but I got 
this for example, however it is from 2015 (on kernel 4.0 which is a bit 
old):

Friends don't let friends use BTRFS for OLTP
2015/09/16 by Tomas Vondra

https://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp

Interestingly it also compares with ZFS which is doing much better. So 
maybe there is really something to be learned from ZFS.

I did not get clearly whether the benchmark was on an SSD, as Tomas 
notes the "ssd" mount option, it might have been.

Thanks,
-- 
Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Healthy amount of free space?

2018-07-17 Thread Martin Steigerwald

Nikolay Borisov - 17.07.18, 10:16:
> On 17.07.2018 11:02, Martin Steigerwald wrote:
> > Nikolay Borisov - 17.07.18, 09:20:
> >> On 16.07.2018 23:58, Wolf wrote:
> >>> Greetings,
> >>> I would like to ask what what is healthy amount of free space to
> >>> keep on each device for btrfs to be happy?
> >>> 
> >>> This is how my disk array currently looks like
> >>> 
> >>> [root@dennas ~]# btrfs fi usage /raid
> >>> 
> >>> Overall:
> >>> Device size:  29.11TiB
> >>> Device allocated: 21.26TiB
> >>> Device unallocated:7.85TiB
> >>> Device missing:  0.00B
> >>> Used: 21.18TiB
> >>> Free (estimated):  3.96TiB  (min: 3.96TiB)
> >>> Data ratio:   2.00
> >>> Metadata ratio:   2.00
> >>> Global reserve:  512.00MiB  (used: 0.00B)
> > 
> > […]
> > 
> >>> Btrfs does quite good job of evenly using space on all devices.
> >>> No,
> >>> how low can I let that go? In other words, with how much space
> >>> free/unallocated remaining space should I consider adding new
> >>> disk?
> >> 
> >> Btrfs will start running into problems when you run out of
> >> unallocated space. So the best advice will be monitor your device
> >> unallocated, once it gets really low - like 2-3 gb I will suggest
> >> you run balance which will try to free up unallocated space by
> >> rewriting data more compactly into sparsely populated block
> >> groups. If after running balance you haven't really freed any
> >> space then you should consider adding a new drive and running
> >> balance to even out the spread of data/metadata.
> > 
> > What are these issues exactly?
> 
> For example if you have plenty of data space but your metadata is full
> then you will be getting ENOSPC.

Of that one I am aware.

This just did not happen so far.

I did not yet add it explicitly to the training slides, but I just make 
myself a note to do that.

Anything else?

> > I have
> > 
> > % btrfs fi us -T /home
> > 
> > Overall:
> > Device size: 340.00GiB
> > Device allocated:340.00GiB
> > Device unallocated:2.00MiB
> > Device missing:  0.00B
> > Used:308.37GiB
> > Free (estimated): 14.65GiB  (min: 14.65GiB)
> > Data ratio:   2.00
> > Metadata ratio:   2.00
> > Global reserve:  512.00MiB  (used: 0.00B)
> > 
> >   Data  Metadata System
> > 
> > Id Path   RAID1 RAID1RAID1Unallocated
> > -- -- -   ---
> > 
> >  1 /dev/mapper/msata-home 165.89GiB  4.08GiB 32.00MiB 1.00MiB
> >  2 /dev/mapper/sata-home  165.89GiB  4.08GiB 32.00MiB 1.00MiB
> > 
> > -- -- -   ---
> > 
> >Total  165.89GiB  4.08GiB 32.00MiB 2.00MiB
> >Used   151.24GiB  2.95GiB 48.00KiB
>
> You already have only 33% of your metadata full so if your workload
> turned out to actually be making more metadata-heavy changed i.e
> snapshots you could exhaust this and get ENOSPC, despite having around
> 14gb of free data space. Furthermore this data space is spread around
> multiple data chunks, depending on how populated they are a balance
> could be able to free up unallocated space which later could be
> re-purposed for metadata (again, depending on what you are doing).

The filesystem above IMO is not fit for snapshots. It would fill up 
rather quickly, I think even when I balance metadata. Actually I tried 
this and as I remember it took at most a day until it was full.

If I read above figures currently at maximum I could gain one additional 
GiB by balancing metadata. That would not make a huge difference.

I bet I am already running this filesystem beyond recommendation, as I 
bet many would argue it is to full already for regular usage… I do not 
see the benefit of squeezing the last free space out of it just to fit 
in another GiB.

So I still do not get the point why it would make sense to balance it at 
this point in time. Especially as this 1 GiB I could regain is not even 
needed. And I do not see th

Re: Healthy amount of free space?

2018-07-17 Thread Martin Steigerwald

Hi Nikolay.

Nikolay Borisov - 17.07.18, 09:20:
> On 16.07.2018 23:58, Wolf wrote:
> > Greetings,
> > I would like to ask what what is healthy amount of free space to
> > keep on each device for btrfs to be happy?
> > 
> > This is how my disk array currently looks like
> > 
> > [root@dennas ~]# btrfs fi usage /raid
> > 
> > Overall:
> > Device size:  29.11TiB
> > Device allocated: 21.26TiB
> > Device unallocated:7.85TiB
> > Device missing:  0.00B
> > Used: 21.18TiB
> > Free (estimated):  3.96TiB  (min: 3.96TiB)
> > Data ratio:   2.00
> > Metadata ratio:   2.00
> > Global reserve:  512.00MiB  (used: 0.00B)
[…]
> > Btrfs does quite good job of evenly using space on all devices. No,
> > how low can I let that go? In other words, with how much space
> > free/unallocated remaining space should I consider adding new disk?
> 
> Btrfs will start running into problems when you run out of unallocated
> space. So the best advice will be monitor your device unallocated,
> once it gets really low - like 2-3 gb I will suggest you run balance
> which will try to free up unallocated space by rewriting data more
> compactly into sparsely populated block groups. If after running
> balance you haven't really freed any space then you should consider
> adding a new drive and running balance to even out the spread of
> data/metadata.

What are these issues exactly?

I have

% btrfs fi us -T /home
Overall:
Device size: 340.00GiB
Device allocated:340.00GiB
Device unallocated:2.00MiB
Device missing:  0.00B
Used:308.37GiB
Free (estimated): 14.65GiB  (min: 14.65GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

  Data  Metadata System  
Id Path   RAID1 RAID1RAID1Unallocated
-- -- -   ---
 1 /dev/mapper/msata-home 165.89GiB  4.08GiB 32.00MiB 1.00MiB
 2 /dev/mapper/sata-home  165.89GiB  4.08GiB 32.00MiB 1.00MiB
-- -- -   ---
   Total  165.89GiB  4.08GiB 32.00MiB 2.00MiB
   Used   151.24GiB  2.95GiB 48.00KiB

on a RAID-1 filesystem one, part of the time two Plasma desktops + 
KDEPIM and Akonadi + Baloo desktop search + you name it write to like 
mad.

Since kernel 4.5 or 4.6 this simply works. Before that sometimes BTRFS 
crawled to an halt on searching for free blocks, and I had to switch off 
the laptop uncleanly. If that happened, a balance helped for a while. 
But since 4.5 or 4.6 this did not happen anymore.

I found with SLES 12 SP 3 or so there is btrfsmaintenance running a 
balance weekly. Which created an issue on our Proxmox + Ceph on Intel 
NUC based opensource demo lab. This is for sure no recommended 
configuration for Ceph and Ceph is quite slow on these 2,5 inch 
harddisks and 1 GBit network link, despite albeit somewhat minimal, 
limited to 5 GiB m.2 SSD caching. What happened it that the VM crawled 
to a halt and the kernel gave task hung for more than 120 seconds 
messages. The VM was basically unusable during the balance. Sure that 
should not happen with a "proper" setup, also it also did not happen 
without the automatic balance.

Also what would happen on a hypervisor setup with several thousands of 
VMs with BTRFS, when several 100 of them decide to start the balance at 
a similar time? It could probably bring the I/O system below to an halt, 
as many enterprise storage systems are designed to sustain burst I/O 
loads, but not maximum utilization during an extended period of time.

I am really wondering what to recommend in my Linux performance tuning 
and analysis courses. On my own laptop I do not do regular balances so 
far. Due to my thinking: If it is not broken, do not fix it.

My personal opinion here also is: If the filesystem degrades that much 
that it becomes unusable without regular maintenance from user space, 
the filesystem needs to be fixed. Ideally I would not have to worry on 
whether to regularly balance an BTRFS or not. In other words: I should 
not have to visit a performance analysis and tuning course in order to 
use a computer with BTRFS filesystem.

Thanks,
-- 
Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/2] btrfs: Check each block group has corresponding chunk at mount time

2018-07-03 Thread Martin Steigerwald

Nikolay Borisov - 03.07.18, 11:08:
> On  3.07.2018 11:47, Qu Wenruo wrote:
> > On 2018年07月03日 16:33, Nikolay Borisov wrote:
> >> On  3.07.2018 11:08, Qu Wenruo wrote:
> >>> Reported in https://bugzilla.kernel.org/show_bug.cgi?id=199837, if
> >>> a
> >>> crafted btrfs with incorrect chunk<->block group mapping, it could
> >>> leads to a lot of unexpected behavior.
> >>> 
> >>> Although the crafted image can be catched by block group item
> >>> checker
> >>> added in "[PATCH] btrfs: tree-checker: Verify block_group_item",
> >>> if one crafted a valid enough block group item which can pass
> >>> above check but still mismatch with existing chunk, it could
> >>> cause a lot of undefined behavior.
> >>> 
> >>> This patch will add extra block group -> chunk mapping check, to
> >>> ensure we have a completely matching (start, len, flags) chunk
> >>> for each block group at mount time.
> >>> 
> >>> Reported-by: Xu Wen 
> >>> Signed-off-by: Qu Wenruo 
> >>> ---
> >>> changelog:
> >>> 
> >>> v2:
> >>>   Add better error message for each mismatch case.
> >>>   Rename function name, to co-operate with later patch.
> >>>   Add flags mismatch check.
> >>> 
> >>> ---
> >> 
> >> It's getting really hard to keep track of the various validation
> >> patches you sent with multiple versions + new checks. Please batch
> >> everything in a topic series i.e "Making checks stricter" or some
> >> such and send everything again nicely packed, otherwise the risk
> >> of mis-merging is increased.
> > 
> > Indeed, I'll send the branch and push it to github.
> > 
> >> I now see that Gu Jinxiang from fujitsu also started sending
> >> validation fixes.
> > 
> > No need to worry, that will be the only patch related to that thread
> > of bugzilla from Fujitsu.
> > As all the other cases can be addressed by my patches, sorry Fujitsu
> > guys :)> 
> >> Also for evry patch which fixes a specific issue from one of the
> >> reported on bugzilla.kernel.org just use the Link: tag to point to
> >> the original report on bugzilla that will make it easier to relate
> >> the fixes to the original report.
> > 
> > Never heard of "Link:" tag.
> > Maybe it's a good idea to added it to "submitting-patches.rst"?
> 
> I guess it's not officially documented but if you do git log --grep
> "Link:" you'd see quite a lot of patches actually have a Link pointing
> to the original thread if it has sparked some pertinent discussion.
> In this case those patches are a direct result of a bugzilla
> bugreport so having a Link: tag makes sense.

For Bugzilla reports I saw something like

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43511

in a patch I was Cc´d to.

Of course that does only apply if the patch in question fixes the 
reported bug.

> In the example of the qgroup patch I sent yesterday resulting from
> Misono's report there was also an involved discussion hence I added a
> link to the original thread.
[…]
-- 
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass

2018-05-12 Thread Martin Steigerwald

Hey James.

james harvey - 12.05.18, 07:08:
> 100% reproducible, booting from disk, or even Arch installation ISO.
> Kernel 4.16.7.  btrfs-progs v4.16.
> 
> Reading one of two journalctl files causes a kernel oops.  Initially
> ran into it from "journalctl --list-boots", but cat'ing the file does
> it too.  I believe this shows there's compressed data that is invalid,
> but its btrfs checksum is invalid.  I've cat'ed every file on the
> disk, and luckily have the problems narrowed down to only these 2
> files in /var/log/journal.
> 
> This volume has always been mounted with lzo compression.
> 
> scrub has never found anything, and have ran it since the oops.
> 
> Found a user a few years ago who also ran into this, without
> resolution, at:
> https://www.spinics.net/lists/linux-btrfs/msg52218.html
> 
> 1. Cat'ing a (non-essential) file shouldn't be able to bring down the
> system.
> 
> 2. If this is infact invalid compressed data, there should be a way to
> check for that.  Btrfs check and scrub pass.

I think systemd-journald sets those files to nocow on BTRFS in order to 
reduce fragmentation: That means no checksums, no snapshots, no nothing. 
I just removed /var/log/journal and thus disabled journalling to disk. 
Its sufficient for me to have the recent state in /run/journal.

Can you confirm nocow being set via lsattr on those files?

Still they should be decompressible just fine.

> Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
> other files.
> 
> 
> 
> [  381.869940] BUG: unable to handle kernel paging request at
> 00390e50 [  381.870881] BTRFS: decompress failed
[…]
-- 
Martin


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Read before you deploy btrfs + zstd

2017-11-15 Thread Martin Steigerwald

David Sterba - 15.11.17, 15:39:
> On Tue, Nov 14, 2017 at 07:53:31PM +0100, David Sterba wrote:
> > On Mon, Nov 13, 2017 at 11:50:46PM +0100, David Sterba wrote:
> > > Up to now, there are no bootloaders supporting ZSTD.
> > 
> > I've tried to implement the support to GRUB, still incomplete and hacky
> > but most of the code is there.  The ZSTD implementation is copied from
> > kernel. The allocators need to be properly set up, as it needs to use
> > grub_malloc/grub_free for the workspace thats called from some ZSTD_*
> > functions.
> > 
> > https://github.com/kdave/grub/tree/btrfs-zstd
> 
> The branch is now in a state that can be tested. Turns out the memory
> requirements are too much for grub, so the boot fails with "not enough
> memory". The calculated value
> 
> ZSTD_BTRFS_MAX_INPUT: 131072
> ZSTD_DStreamWorkspaceBound with ZSTD_BTRFS_MAX_INPUT: 549424
> 
> This is not something I could fix easily, we'd probalby need a tuned
> version of ZSTD for grub constraints. Adding Nick to CC.

Somehow I am happy that I still have a plain Ext4 for /boot. :)

Thanks for looking into Grub support anyway.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Read before you deploy btrfs + zstd

2017-11-14 Thread Martin Steigerwald

David Sterba - 14.11.17, 19:49:
> On Tue, Nov 14, 2017 at 08:34:37AM +0100, Martin Steigerwald wrote:
> > Hello David.
> > 
> > David Sterba - 13.11.17, 23:50:
> > > while 4.14 is still fresh, let me address some concerns I've seen on
> > > linux
> > > forums already.
> > > 
> > > The newly added ZSTD support is a feature that has broader impact than
> > > just the runtime compression. The btrfs-progs understand filesystem with
> > > ZSTD since 4.13. The remaining key part is the bootloader.
> > > 
> > > Up to now, there are no bootloaders supporting ZSTD. This could lead to
> > > an
> > > unmountable filesystem if the critical files under /boot get
> > > accidentally
> > > or intentionally compressed by ZSTD.
> > 
> > But otherwise ZSTD is safe to use? Are you aware of any other issues?
> 
> No issues from my own testing or reported by other users.

Thanks to you and the others. I think I try this soon.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Read before you deploy btrfs + zstd

2017-11-13 Thread Martin Steigerwald

Hello David.

David Sterba - 13.11.17, 23:50:
> while 4.14 is still fresh, let me address some concerns I've seen on linux
> forums already.
> 
> The newly added ZSTD support is a feature that has broader impact than
> just the runtime compression. The btrfs-progs understand filesystem with
> ZSTD since 4.13. The remaining key part is the bootloader.
> 
> Up to now, there are no bootloaders supporting ZSTD. This could lead to an
> unmountable filesystem if the critical files under /boot get accidentally
> or intentionally compressed by ZSTD.

But otherwise ZSTD is safe to use? Are you aware of any other issues?

I consider switching from LZO to ZSTD on this ThinkPad T520 with Sandybridge.

Thank you,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Data and metadata extent allocators [1/2]: Recap: The data story

2017-10-27 Thread Martin Steigerwald

Hello Hans,

Hans van Kranenburg - 27.10.17, 20:17:
> This is a followup to my previous threads named "About free space
> fragmentation, metadata write amplification and (no)ssd" [0] and
> "Experiences with metadata balance/convert" [1], exploring how good or
> bad btrfs can handle filesystems that are larger than your average
> desktop computer and/or which see a pattern of writing and deleting huge
> amounts of files of wildly varying sizes all the time.
[…]
> Q: How do I fight this and prevent getting into a situation where all
> raw space is allocated, risking a filesystem crash?
> A: Use btrfs balance to fight the symptoms. It reads data and writes it
> out again without the free space fragments.

What do you mean by a filesystem crash? Since kernel 4.5 or 4.6 I don´t see any 
BTRFS related filesystem hangs anymore on the /home BTRFS Dual SSD RAID 1 on my 
Laptop, which one or two copies of Akonadi, Baloo and other desktop related 
stuff write *heavily to* and which has all free space allocated into cunks 
since a pretty long time:

merkaba:~> btrfs fi usage -T /home
Overall:
Device size: 340.00GiB
Device allocated:340.00GiB
Device unallocated:2.00MiB
Device missing:  0.00B
Used:290.32GiB
Free (estimated): 23.09GiB  (min: 23.09GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

  Data  Metadata System  
Id Path   RAID1 RAID1RAID1Unallocated
-- -- -   ---
 1 /dev/mapper/msata-home 163.94GiB  6.03GiB 32.00MiB 1.00MiB
 2 /dev/mapper/sata-home  163.94GiB  6.03GiB 32.00MiB 1.00MiB
-- -- -   ---
   Total  163.94GiB  6.03GiB 32.00MiB 2.00MiB
   Used   140.85GiB  4.31GiB 48.00KiB

I didn´t do a balance on this filesystem since a long time (kernel 4.6).

Granted my filesystem is smaller than the typical backup BTRFS. I do have two 3 
TB and one 1,5 TB SATA disks I backup to and another 2 TB BTRFS on a backup 
server that I use for borgbackup (and that doesn´t yet do any snapshots and 
may be better of running as XFS as it doesn´t really need snapshots as 
borgbackup takes care of that. A BTRFS snapshot would only come handy to be 
able to go back to a previous borgbackup repo in case it for whatever reason 
gets corrupted or damaged / deleted by an attacker who only access to non 
privileged user). – However all of these filesystems have plenty of free space 
currently and are not accessed daily.

> Q: Why would it crash the file system when all raw space is allocated?
> Won't it start trying harder to reuse the free space inside?
> A: Yes, it will, for data. The big problem here is that allocation of a
> new metadata chunk when needed is not possible any more.

And there it hangs or really crashes?

[…]

> Q: Why do the pictures of my data block groups look like someone fired a
> shotgun at it. [3], [4]?
> A: Because the data extent allocator that is active when using the 'ssd'
> mount option both tends to ignore smaller free space fragments all the
> time, and also behaves in a way that causes more of them to appear. [5]
> 
> Q: Wait, why is there "ssd" in my mount options? Why does btrfs think my
> iSCSI attached lun is an SSD?
> A: Because it makes wrong assumptions based on the rotational attribute,
> which we can also see in sysfs.
> 
> Q: Why does this ssd mode ignore free space?
> A: Because it makes assumptions about the mapping of the addresses of
> the block device we see in linux and the storage in actual flash chips
> inside the ssd. Based on that information it decides where to write or
> where not to write any more.
> 
> Q: Does this make sense in 2017?
> A: No. The interesting relevant optimization when writing to an ssd
> would be to write all data together that will be deleted or overwritten
> together at the same time in the future. Since btrfs does not come with
> a time machine included, it can't do this. So, remove this behaviour
> instead. [6]
> 
> Q: What will happen when I use kernel 4.14 with the previously mentioned
> change, or if I change to the nossd mount option explicitely already?
> A: Relatively small free space fragments in existing chunks will
> actually be reused for new writes that fit, working from the beginning
> of the virtual address space upwards. It's like tetris, trying to
> completely fill up the lowest lines first. See the big difference in
> behavior when changing extent allocator happening at 16 seconds into
> this timelapse movie: [7] (virtual address space)

I see a difference in behavior but I do not yet fully understand what I am 
looking at.
 
> Q: But what if all my chunks have badly fragmented free space right now?
> A: If

Something like ZFS Channel Programs for BTRFS & probably XFS or even VFS?

2017-10-03 Thread Martin Steigerwald

[repost. I didn´t notice autocompletion gave me wrong address for fsdevel, 
blacklisted now]

Hello.

What do you think of

http://open-zfs.org/wiki/Projects/ZFS_Channel_Programs

?

There are quite some BTRFS maintenance programs like the deduplication stuff. 
Also regular scrubs… and in certain circumstances probably balances can make 
sense.

In addition to this XFS got scrub functionality as well.

Now putting the foundation for such a functionality in the kernel I think 
would only be reasonable if it cannot be done purely within user space, so I 
wonder about the safety from other concurrent ZFS modification and atomicity 
that are mentioned on the wiki page. The second set of slides, those the 
OpenZFS Developer Commit 2014, which are linked to on the wiki page explain 
this more. (I didn´t look the first ones, as I am no fan of slideshare.net and 
prefer a simple PDF to download and view locally anytime, not for privacy 
reasons alone, but also to avoid a using a crappy webpage over a wonderfully 
functional PDF viewer fat client like Okular)

Also I wonder about putting a lua interpreter into the kernel, but it seems at 
least NetBSD developers added one to their kernel with version 7.0¹.

I also ask this cause I wondered about a kind of fsmaintd or volmaintd for 
quite a while, and thought… it would be nice to do this in a generic way, as 
BTRFS is not the only filesystem which supports maintenance operations. However 
if it can all just nicely be done in userspace, I am all for it.

[1] http://www.netbsd.org/releases/formal-7/NetBSD-7.0.html
(tons of presentation PDFs on their site as well)

Thanks,
-- 
Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Something like ZFS Channel Programs for BTRFS & probably XFS or even VFS?

2017-10-03 Thread Martin Steigerwald

Hello.

What do you think of

http://open-zfs.org/wiki/Projects/ZFS_Channel_Programs

?

There are quite some BTRFS maintenance programs like the deduplication stuff. 
Also regular scrubs… and in certain circumstances probably balances can make 
sense.

In addition to this XFS got scrub functionality as well.

Now putting the foundation for such a functionality in the kernel I think 
would only be reasonable if it cannot be done purely within user space, so I 
wonder about the safety from other concurrent ZFS modification and atomicity 
that are mentioned on the wiki page. The second set of slides, those the 
OpenZFS Developer Commit 2014, which are linked to on the wiki page explain 
this more. (I didn´t look the first ones, as I am no fan of slideshare.net and 
prefer a simple PDF to download and view locally anytime, not for privacy 
reasons alone, but also to avoid a using a crappy webpage over a wonderfully 
functional PDF viewer fat client like Okular)

Also I wonder about putting a lua interpreter into the kernel, but it seems at 
least NetBSD developers added one to their kernel with version 7.0¹.

I also ask this cause I wondered about a kind of fsmaintd or volmaintd for 
quite a while, and thought… it would be nice to do this in a generic way, as 
BTRFS is not the only filesystem which supports maintenance operations. However 
if it can all just nicely be done in userspace, I am all for it.

[1] http://www.netbsd.org/releases/formal-7/NetBSD-7.0.html
(tons of presentation PDFs on their site as well)

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-07-09 Thread Martin Steigerwald

Hello Duncan.

Duncan - 09.07.17, 11:17:
> Paul Jones posted on Sun, 09 Jul 2017 09:16:36 + as excerpted:
> >> Marc MERLIN - 08.07.17, 21:34:
> >> > This is now the 3rd filesystem I have (on 3 different machines) that
> >> > is getting corruption of some kind (on 4.11.6).
> >> 
> >> Anyone else getting corruptions with 4.11?
> >> 
> >> I happily switch back to 4.10.17 or even 4.9 if that is the case. I may
> >> even do so just from your reports. Well, yes, I will do exactly that. I
> >> just switch back for 4.10 for now. Better be safe, than sorry.
> > 
> > No corruption for me - I've been on 4.11 since about .2 and everything
> > seems fine. Currently on 4.11.8
> 
> No corruptions here either. 4.12.0 now, previously 4.12-rc5(ish, git),
> before that 4.11.0.
> 
> I have however just upgraded to new ssds then wiped and setup the old
[…]
> Also, all my btrfs are raid1 or dup for checksummed redundancy, and
> relatively small, the largest now 80 GiB per device, after the upgrade.
> And my use-case doesn't involve snapshots or subvolumes.
> 
> So any bug that is most likely on older filesystems, say those without
> the no-holes feature, for instance, or that doesn't tend to hit raid1 or
> dup mode, or that is less likely on small filesystems on fast ssds, or
> that triggers most often with reflinks and thus on filesystems with
> snapshots, is unlikely to hit me.

Hmmm, the BTRFS filesystems on my laptop 3 to 5 or even more years old. I stick 
with 4.10 for now, I think.

The older ones are RAID 1 across two SSDs, the newer one is single device, on 
one SSD.

These filesystems didn´t fail me in years and since 4.5 or 4.6 even the "I 
search for free space" kernel hang (hung tasks and all that) is gone as well.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-07-09 Thread Martin Steigerwald

Hello Marc.

Marc MERLIN - 08.07.17, 21:34:
> Sigh,
> 
> This is now the 3rd filesystem I have (on 3 different machines) that is
> getting corruption of some kind (on 4.11.6).

Anyone else getting corruptions with 4.11?

I happily switch back to 4.10.17 or even 4.9 if that is the case. I may even 
do so just from your reports. Well, yes, I will do exactly that. I just switch 
back for 4.10 for now. Better be safe, than sorry.

I know how you feel, Marc. I posted about a corruption on one of my backup 
harddisks here some time ago that btrfs check --repair wasn´t able to handle. 
I redid that disk from scratch and it took a long, long time.

I agree with you that this has to stop. Before that I will never *ever* 
recommend this to a customer. Ideally no corruptions in stable kernels, 
especially when its a .6 at the end of the version number. But if so… then 
fixable. Other filesystems like Ext4 and XFS can do it… so this should be 
possible with BTRFS as well.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: runtime btrfsck

2017-05-10 Thread Martin Steigerwald

Stefan Priebe - Profihost AG - 10.05.17, 09:02:
> I'm now trying btrfs progs 4.10.2. Is anybody out there who can tell me
> something about the expected runtime or how to fix bad key ordering?

I had a similar issue which remained unresolved.

But I clearly saw that btrfs check was running in a loop, see thread:

[4.9] btrfs check --repair looping over file extent discount errors

So it would be interesting to see the exact output of btrfs check, maybe there 
is something like repeated numbers that also indicate a loop.

I was about to say that BTRFS is production ready before this issue happened. 
I still think it for a lot of setup mostly is, as at least the "I get stuck on 
the CPU while searching for free space" issue seems to be gone since about 
anything between 4.5/4.6 kernels. I also think so regarding absence of data 
loss. I was able to copy over all of the data I needed of the broken 
filesystem.

Yet, when it comes to btrfs check? Its still quite rudimentary if you ask me.  
So unless someone has a clever idea here and shares it with you, it may be 
needed to backup anything you can from this filesystem and then start over from 
scratch. As to my past experience something like xfs_repair surpasses btrfs 
check in the ability to actually fix broken filesystem by a great extent.

Ciao,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.9] btrfs check --repair looping over file extent discount errors

2017-04-22 Thread Martin Steigerwald

Martin Steigerwald - 22.04.17, 20:01:
> Chris Murphy - 22.04.17, 09:31:
> > Is the file system created with no-holes?
> 
> I have how to find out about it and while doing accidentally set that

I didn´t find out how to find out about it and…

> feature on another filesystem (btrfstune only seems to be able to enable
> the feature, not show the current state of it).
> 
> But as there is no notice of the feature being set as standard in manpage of
> mkfs.btrfs as of BTRFS tools 4.9.1 and as I didn´t set it myself, I best
> bet is that the feature is not enable on the filesystem.
> 
> Now I wonder… how to disable the feature on that other filesystem again.
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.9] btrfs check --repair looping over file extent discount errors

2017-04-22 Thread Martin Steigerwald

Hello Chris.

Chris Murphy - 22.04.17, 09:31:
> Is the file system created with no-holes?

I have how to find out about it and while doing accidentally set that feature 
on another filesystem (btrfstune only seems to be able to enable the feature, 
not show the current state of it).

But as there is no notice of the feature being set as standard in manpage of 
mkfs.btrfs as of BTRFS tools 4.9.1 and as I didn´t set it myself, I best bet 
is that the feature is not enable on the filesystem.

Now I wonder… how to disable the feature on that other filesystem again.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.9] btrfs check --repair looping over file extent discount errors

2017-04-22 Thread Martin Steigerwald

Hello.

I am planning to copy of important data on the disk with the broken filesystem 
to the disk with the good filesystem and then reformatitting the disk with the 
broken filesystem soon, probably in the course of the day… so in case you want 
any debug information before that, let me know ASAP.

Thanks,
Martin

Martin Steigerwald - 14.04.17, 21:35:
> Hello,
> 
> backup harddisk connected via eSATA. Hard kernel hang, mouse pointer
> freezing two times seemingly after finishing /home backup and creating new
> snapshot on source BTRFS SSD RAID 1 for / in order to backup it. I did
> scrubbed / and it appears to be okay, but I didn´t run btrfs check on it.
> Anyway deleting that subvolume works and I as I suspected an issue with the
> backup disk I started with that one.
> 
> I got
> 
> merkaba:~> btrfs --version
> btrfs-progs v4.9.1
> 
> merkaba:~> cat /proc/version
> Linux version 4.9.20-tp520-btrfstrim+ (martin@merkaba) (gcc version 6.3.0
> 20170321 (Debian 6.3.0-11) ) #6 SMP PREEMPT Mon Apr 3 11:42:17 CEST 2017
> 
> merkaba:~> btrfs fi sh feenwald
> Label: 'feenwald'  uuid: […]
> Total devices 1 FS bytes used 1.26TiB
> devid1 size 2.73TiB used 1.27TiB path /dev/sdc1
> 
> on Debian unstable on ThinkPad T520 connected via eSATA port on Minidock.
> 
> 
> I am now running btrfs check --repair on it after without --repair the
> command reported file extent discount errors and it appears to loop on the
> same file extent discount errors for ages. Any advice?
> 
> I do have another backup harddisk with BTRFS that worked fine today, so I do
> not need to recover that drive immediately. I may let it run for a little
> more time, but then will abort the repair process as I really think its
> looping just over and over and over the same issues again. At some time I
> may just copy all the stuff that is on that harddisk, but not on the other
> one over to the other one and mkfs.btrfs the filesystem again, but I´d
> rather like to know whats happening here.
> 
> Here is output:
> 
> merkaba:~> btrfs check --repair /dev/sdc1
> enabling repair mode
> Checking filesystem on /dev/sdc1
> [… UUID ommited …]
> checking extents
> Fixed 0 roots.
> checking free space cache
> cache and super generation don't match, space cache will be invalidated
> checking fs roots
> root 257 inode 4979842 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 78798848
> root 257 inode 4980212 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 143360
> root 257 inode 4980214 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 4227072
> root 257 inode 4979842 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 78798848
> root 257 inode 4980212 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 143360
> root 257 inode 4980214 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 4227072
> root 257 inode 4979842 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 78798848
> root 257 inode 4980212 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 143360
> root 257 inode 4980214 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 4227072
> root 257 inode 4979842 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 78798848
> root 257 inode 4980212 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 143360
> root 257 inode 4980214 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 4227072
> [… hours later …]
> root 257 inode 4979842 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 78798848
> root 257 inode 4980212 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 143360
> root 257 inode 4980214 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 4227072
> root 257 inode 4979842 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 78798848
> root 257 inode 4980212 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 143360
> root 257 inode 4980214 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 4227072
> root 257 inode 4979842 errors 100, file extent discount
> Found file extent holes:
> start: 0, len: 78798848
> root 257 inode 4980212 errors 100, file extent discount
> Found file extent holes:

[4.9] btrfs check --repair looping over file extent discount errors

2017-04-14 Thread Martin Steigerwald

Hello,

backup harddisk connected via eSATA. Hard kernel hang, mouse pointer freezing 
two times seemingly after finishing /home backup and creating new snapshot on 
source BTRFS SSD RAID 1 for / in order to backup it. I did scrubbed / and it 
appears to be okay, but I didn´t run btrfs check on it. Anyway deleting that 
subvolume works and I as I suspected an issue with the backup disk I started 
with that one.

I got

merkaba:~> btrfs --version
btrfs-progs v4.9.1

merkaba:~> cat /proc/version
Linux version 4.9.20-tp520-btrfstrim+ (martin@merkaba) (gcc version 6.3.0 
20170321 (Debian 6.3.0-11) ) #6 SMP PREEMPT Mon Apr 3 11:42:17 CEST 2017

merkaba:~> btrfs fi sh feenwald
Label: 'feenwald'  uuid: […]
Total devices 1 FS bytes used 1.26TiB
devid1 size 2.73TiB used 1.27TiB path /dev/sdc1

on Debian unstable on ThinkPad T520 connected via eSATA port on Minidock.


I am now running btrfs check --repair on it after without --repair the command 
reported file extent discount errors and it appears to loop on the same file 
extent discount errors for ages. Any advice?

I do have another backup harddisk with BTRFS that worked fine today, so I do 
not need to recover that drive immediately. I may let it run for a little more 
time, but then will abort the repair process as I really think its looping 
just over and over and over the same issues again. At some time I may just 
copy all the stuff that is on that harddisk, but not on the other one over to 
the other one and mkfs.btrfs the filesystem again, but I´d rather like to know 
whats happening here.

Here is output:

merkaba:~> btrfs check --repair /dev/sdc1
enabling repair mode
Checking filesystem on /dev/sdc1
[… UUID ommited …]
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 257 inode 4979842 errors 100, file extent discount
Found file extent holes:
start: 0, len: 78798848
root 257 inode 4980212 errors 100, file extent discount
Found file extent holes:
start: 0, len: 143360
root 257 inode 4980214 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4227072
root 257 inode 4979842 errors 100, file extent discount
Found file extent holes:
start: 0, len: 78798848
root 257 inode 4980212 errors 100, file extent discount
Found file extent holes:
start: 0, len: 143360
root 257 inode 4980214 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4227072
root 257 inode 4979842 errors 100, file extent discount
Found file extent holes:
start: 0, len: 78798848
root 257 inode 4980212 errors 100, file extent discount
Found file extent holes:
start: 0, len: 143360
root 257 inode 4980214 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4227072
root 257 inode 4979842 errors 100, file extent discount
Found file extent holes:
start: 0, len: 78798848
root 257 inode 4980212 errors 100, file extent discount
Found file extent holes:
start: 0, len: 143360
root 257 inode 4980214 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4227072
[… hours later …]
root 257 inode 4979842 errors 100, file extent discount
Found file extent holes:
start: 0, len: 78798848
root 257 inode 4980212 errors 100, file extent discount
Found file extent holes:
start: 0, len: 143360
root 257 inode 4980214 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4227072
root 257 inode 4979842 errors 100, file extent discount
Found file extent holes:
start: 0, len: 78798848
root 257 inode 4980212 errors 100, file extent discount
Found file extent holes:
start: 0, len: 143360
root 257 inode 4980214 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4227072
root 257 inode 4979842 errors 100, file extent discount
Found file extent holes:
start: 0, len: 78798848
root 257 inode 4980212 errors 100, file extent discount
Found file extent holes:
start: 0, len: 143360
root 257 inode 4980214 errors 100, file extent discount
Found file extent holes:
start: 0, len: 4227072

This basically seems to go on like this forever.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Convert from RAID 5 to 10

2016-11-30 Thread Martin Steigerwald

Am Mittwoch, 30. November 2016, 12:09:23 CET schrieb Chris Murphy:
> On Wed, Nov 30, 2016 at 7:37 AM, Austin S. Hemmelgarn
> 
>  wrote:
> > The stability info could be improved, but _absolutely none_ of the things
> > mentioned as issues with raid1 are specific to raid1.  And in general, in
> > the context of a feature stability matrix, 'OK' generally means that there
> > are no significant issues with that specific feature, and since none of
> > the
> > issues outlined are specific to raid1, it does meet that description of
> > 'OK'.
> 
> Maybe the gotchas page needs a one or two liner for each profile's
> gotchas compared to what the profile leads the user into believing.
> The overriding gotcha with all Btrfs multiple device support is the
> lack of monitoring and notification other than kernel messages; and
> the raid10 actually being more like raid0+1 I think it certainly a
> gotcha, however 'man mkfs.btrfs' contains a grid that very clearly
> states raid10 can only safely lose 1 device.

Wow, that manpage is quite an resource.

Developers, documentation people definitely improved the official BTRFS 
documentation.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Convert from RAID 5 to 10

2016-11-30 Thread Martin Steigerwald

Am Mittwoch, 30. November 2016, 16:49:59 CET schrieb Wilson Meier:
> Am 30/11/16 um 15:37 schrieb Austin S. Hemmelgarn:
> > On 2016-11-30 08:12, Wilson Meier wrote:
> >> Am 30/11/16 um 11:41 schrieb Duncan:
> >>> Wilson Meier posted on Wed, 30 Nov 2016 09:35:36 +0100 as excerpted:
> >>>> Am 30/11/16 um 09:06 schrieb Martin Steigerwald:
> >>>>> Am Mittwoch, 30. November 2016, 10:38:08 CET schrieb Roman Mamedov:
[…]
> >> It is really disappointing to not have this information in the wiki
> >> itself. This would have saved me, and i'm quite sure others too, a lot
> >> of time.
> >> Sorry for being a bit frustrated.
> 
> I'm not angry or something like that :) .
> I just would like to have the possibility to read such information about
> the storage i put my personal data (> 3 TB) on its official wiki.

Anyone can get an account on the wiki and add notes there, so feel free.

You can even use footnotes or something like that. Maybe it would be good to 
add a paragraph there that features are related to one another, so while BTRFS 
RAID 1 for example might be quite okay, it depends on features that are still 
flaky.

I for myself rely quite much on BTRFS RAID 1 with lzo compression and it seems 
to work okay for me.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Convert from RAID 5 to 10

2016-11-30 Thread Martin Steigerwald

Am Mittwoch, 30. November 2016, 10:38:08 CET schrieb Roman Mamedov:
> On Wed, 30 Nov 2016 00:16:48 +0100
> 
> Wilson Meier  wrote:
> > That said, btrfs shouldn't be used for other then raid1 as every other
> > raid level has serious problems or at least doesn't work as the expected
> > raid level (in terms of failure recovery).
> 
> RAID1 shouldn't be used either:
> 
> *) Read performance is not optimized: all metadata is always read from the
> first device unless it has failed, data reads are supposedly balanced
> between devices per PID of the process reading. Better implementations
> dispatch reads per request to devices that are currently idle.
> 
> *) Write performance is not optimized, during long full bandwidth sequential
> writes it is common to see devices writing not in parallel, but with a long
> periods of just one device writing, then another. (Admittedly have been
> some time since I tested that).
> 
> *) A degraded RAID1 won't mount by default.
> 
> If this was the root filesystem, the machine won't boot.
> 
> To mount it, you need to add the "degraded" mount option.
> However you have exactly a single chance at that, you MUST restore the RAID
> to non-degraded state while it's mounted during that session, since it
> won't ever mount again in the r/w+degraded mode, and in r/o mode you can't
> perform any operations on the filesystem, including adding/removing
> devices.
> 
> *) It does not properly handle a device disappearing during operation.
> (There is a patchset to add that).
> 
> *) It does not properly handle said device returning (under a
> different /dev/sdX name, for bonus points).
> 
> Most of these also apply to all other RAID levels.

So the stability matrix would need to be updated not to recommend any kind of 
BTRFS RAID 1 at the moment?

Actually I faced the BTRFS RAID 1 read only after first attempt of mounting it 
"degraded" just a short time ago.

BTRFS still needs way more stability work it seems to me.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0

2016-11-17 Thread Martin Steigerwald

Am Donnerstag, 17. November 2016, 12:05:31 CET schrieb Chris Murphy:
> I think the wiki should be updated to reflect that raid1 and raid10
> are mostly OK. I think it's grossly misleading to consider either as
> green/OK when a single degraded read write mount creates single chunks
> that will then prevent a subsequent degraded read write mount. And
> also the lack of various notifications of device faultiness I think
> make it less than OK also. It's not in the "do not use" category but
> it should be in the middle ground status so users can make informed
> decisions.

I agree – as error reporting I think is indead misleading. Feel free to edit 
it.

Ciao,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0

2016-11-16 Thread Martin Steigerwald

Am Mittwoch, 16. November 2016, 07:57:08 CET schrieb Austin S. Hemmelgarn:
> On 2016-11-16 06:04, Martin Steigerwald wrote:
> > Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov:
> >> On Wed, 16 Nov 2016 11:55:32 +0100
> >> 
> >> Martin Steigerwald <martin.steigerw...@teamix.de> wrote:
[…]
> > As there seems to be no force option to override the limitation and I
> > do not feel like compiling my own btrfs-tools right now, I will use rsync
> > instead.
> 
> In a case like this, I'd trust rsync more than send/receive.  The
> following rsync switches might also be of interest:
> -a: This turns on a bunch of things almost everyone wants when using
> rsync, similar to the same switch for cp, just with even more added in.
> -H: This recreates hardlinks on the receiving end.
> -S: This recreates sparse files.
> -A: This copies POSIX ACL's
> -X: This copies extended attributes (most of them at least, there are a
> few that can't be arbitrarily written to).
> Pre-creating the subvolumes by hand combined with using all of those
> will get you almost everything covered by send/receive except for
> sharing of extents and ctime.

I usually use rsync -aAHXSP already :).

I was able to rsync any relevant data of the disk which is now being deleted 
by shred command.

Thank you,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0

2016-11-16 Thread Martin Steigerwald

Am Mittwoch, 16. November 2016, 11:55:32 CET schrieben Sie:
> So mounting work although for some reason scrubbing is aborted (I had this
> issue a long time ago on my laptop as well). After removing /var/lib/btrfs 
> scrub status file for the filesystem:
> 
> merkaba:~> btrfs scrub start /mnt/zeit
> scrub started on /mnt/zeit, fsid […] (pid=9054)
> merkaba:~> btrfs scrub status /mnt/zeit
> scrub status for […]
> scrub started at Wed Nov 16 11:52:56 2016 and was aborted after 
> 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
> 
> Anyway, I will now just rsync off the files.
> 
> Interestingly enough btrfs restore complained about looping over certain
> files… lets see whether the rsync or btrfs send/receive proceeds through.

I have an idea on why scrubbing may not work:

The filesystem is mounted read only and on checksum errors on one disk scrub 
would try to repair it with the good copy from another disk.

Yes, this is it:

merkaba:~>  btrfs scrub start -r /dev/satafp1/daten
scrub started on /dev/satafp1/daten, fsid […] (pid=9375)
merkaba:~>  btrfs scrub status /dev/satafp1/daten 
scrub status for […]
scrub started at Wed Nov 16 12:13:27 2016, running for 00:00:10
total bytes scrubbed: 45.53MiB with 0 errors

It would be helpful to receive a proper error message on this one.

Okay, seems today I learned quite something about BTRFS.

Thanks,

-- 
Martin Steigerwald  | Trainer

teamix GmbH
Südwestpark 43
90449 Nürnberg

Tel.:  +49 911 30999 55 | Fax: +49 911 30999 99
mail: martin.steigerw...@teamix.de | web:  http://www.teamix.de | blog: 
http://blog.teamix.de

Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller

teamix Support Hotline: +49 911 30999-112
 
 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix ***

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0

2016-11-16 Thread Martin Steigerwald

Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov:
> On Wed, 16 Nov 2016 11:55:32 +0100
> 
> Martin Steigerwald <martin.steigerw...@teamix.de> wrote:
> > I do think that above kernel messages invite such a kind of interpretation
> > tough. I took the "BTRFS: open_ctree failed" message as indicative to some
> > structural issue with the filesystem.
> 
> For the reason as to why the writable mount didn't work, check "btrfs fi df"
> for the filesystem to see if you have any "single" profile chunks on it:
> quite likely you did already mount it "degraded,rw" in the past *once*,
> after which those "single" chunks get created, and consequently it won't
> mount r/w anymore (without lifting the restriction on the number of missing
> devices as proposed).

That exactly explains it. I very likely did a degraded mount without ro on 
this disk already.

Funnily enough this creates another complication:

merkaba:/mnt/zeit#1> btrfs send somesubvolume | btrfs receive /mnt/
someotherbtrfs
ERROR: subvolume /mnt/zeit/somesubvolume is not read-only

Yet:

merkaba:/mnt/zeit> btrfs property get somesubvolume
ro=false
merkaba:/mnt/zeit> btrfs property set somesubvolume ro true 
 
ERROR: failed to set flags for somesubvolume: Read-only file system

To me it seems right logic would be to allow the send to proceed in case
the whole filesystem is readonly.

As there seems to be no force option to override the limitation and I
do not feel like compiling my own btrfs-tools right now, I will use rsync
instead.

Thanks,

-- 
Martin Steigerwald  | Trainer

teamix GmbH
Südwestpark 43
90449 Nürnberg

Tel.:  +49 911 30999 55 | Fax: +49 911 30999 99
mail: martin.steigerw...@teamix.de | web:  http://www.teamix.de | blog: 
http://blog.teamix.de

Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller

teamix Support Hotline: +49 911 30999-112
 
 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix ***

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0

2016-11-16 Thread Martin Steigerwald

Am Mittwoch, 16. November 2016, 15:43:36 CET schrieb Roman Mamedov:
> On Wed, 16 Nov 2016 11:25:00 +0100
> 
> Martin Steigerwald <martin.steigerw...@teamix.de> wrote:
> > merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit
> > mount: Falscher Dateisystemtyp, ungültige Optionen, der
> > Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende
> > Kodierungsseite oder ein anderer Fehler
> > 
> >   Manchmal liefert das Systemprotokoll wertvolle Informationen –
> >   versuchen Sie  dmesg | tail  oder ähnlich
> > 
> > merkaba:~#32> dmesg | tail -6
> > [ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts
> > [ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache
> > [ 3080.120703] BTRFS info (device dm-13): disk space caching is
> > enabled
> > [ 3080.120706] BTRFS info (device dm-13): has skinny extents
> > [ 3080.150957] BTRFS warning (device dm-13): missing devices (1)
> > exceeds the limit (0), writeable mount is not allowed
> > [ 3080.195941] BTRFS: open_ctree failed
> 
> I have to wonder did you read the above message? What you need at this point
> is simply "-o degraded,ro". But I don't see that tried anywhere down the
> line.
> 
> See also (or try): https://patchwork.kernel.org/patch/9419189/

Actually I read that one, but I read more into it than what it was saying:

I read into it that BTRFS would automatically use a read only mount.


merkaba:~> mount -o degraded,ro /dev/satafp1/daten /mnt/zeit

actually really works. *Thank you*, Roman.


I do think that above kernel messages invite such a kind of interpretation
tough. I took the "BTRFS: open_ctree failed" message as indicative to some
structural issue with the filesystem.


So mounting work although for some reason scrubbing is aborted (I had this
issue a long time ago on my laptop as well). After removing /var/lib/btrfs 
scrub status file for the filesystem:

merkaba:~> btrfs scrub start /mnt/zeit
scrub started on /mnt/zeit, fsid […] (pid=9054)
merkaba:~> btrfs scrub status /mnt/zeit
scrub status for […]
scrub started at Wed Nov 16 11:52:56 2016 and was aborted after 
00:00:00
total bytes scrubbed: 0.00B with 0 errors

Anyway, I will now just rsync off the files.

Interestingly enough btrfs restore complained about looping over certain
files… lets see whether the rsync or btrfs send/receive proceeds through.

Ciao,

-- 
Martin Steigerwald  | Trainer

teamix GmbH
Südwestpark 43
90449 Nürnberg

Tel.:  +49 911 30999 55 | Fax: +49 911 30999 99
mail: martin.steigerw...@teamix.de | web:  http://www.teamix.de | blog: 
http://blog.teamix.de

Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller

teamix Support Hotline: +49 911 30999-112
 
 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix ***

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0

2016-11-16 Thread Martin Steigerwald

ice dm-13): has skinny extents
[ 3080.150957] BTRFS warning (device dm-13): missing devices (1) exceeds 
the limit (0), writeable mount is not allowed
[ 3080.195941] BTRFS: open_ctree failed

merkaba:~> mount -o degraded,clear_cache,usebackuproot /dev/satafp1/backup 
/mnt/zeit
mount: Falscher Dateisystemtyp, ungültige Optionen, der
Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende
Kodierungsseite oder ein anderer Fehler

  Manchmal liefert das Systemprotokoll wertvolle Informationen –
  versuchen Sie  dmesg | tail  oder ähnlich

merkaba:~> dmesg | tail -7
[ 3173.784713] BTRFS info (device dm-13): allowing degraded mounts
[ 3173.784728] BTRFS info (device dm-13): force clearing of disk cache
[ 3173.784737] BTRFS info (device dm-13): trying to use backup root at 
mount time
[ 3173.784742] BTRFS info (device dm-13): disk space caching is enabled
[ 3173.784746] BTRFS info (device dm-13): has skinny extents
[ 3173.816983] BTRFS warning (device dm-13): missing devices (1) exceeds 
the limit (0), writeable mount is not allowed
[ 3173.865199] BTRFS: open_ctree failed

I aborted repairing after this assert:

merkaba:~#130> btrfs check --repair /dev/satafp1/backup &| stdbuf -oL tee 
btrfs-check-repair-satafp1-backup.log
enabling repair mode
warning, device 2 is missing
Checking filesystem on /dev/satafp1/backup
UUID: 01cf0493-476f-42e8-8905-61ef205313db
checking extents
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs[0x43e418]
btrfs(btrfs_reserve_extent+0x5c9)[0x4425df]
btrfs(btrfs_alloc_free_block+0x63)[0x44297c]
btrfs(__btrfs_cow_block+0xfc)[0x436636]
btrfs(btrfs_cow_block+0x8b)[0x436bd8]
btrfs[0x43ad82]
btrfs(btrfs_commit_transaction+0xb8)[0x43c5dc]
btrfs[0x4268b4]
btrfs(cmd_check+0x)[0x427d6d]
btrfs(main+0x12f)[0x40a341]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fb2e6bec2b1]
btrfs(_start+0x2a)[0x40a37a]

merkaba:~#130> btrfs --version
btrfs-progs v4.7.3

(Honestly I think asserts like this need to be gone from btrfs-tools for good)

About this I only found this unanswered mailing list post:

btrfs-convert: Unable to find block group for 0
Date: Fri, 24 Jun 2016 11:09:27 +0200
https://www.spinics.net/lists/linux-btrfs/msg56478.html


Out of curiosity I tried:

merkaba:~#1> btrfs rescue zero-log //dev/satafp1/daten
warning, device 2 is missing
Clearing log on //dev/satafp1/daten, previous log_root 0, level 0
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs[0x43e418]
btrfs(btrfs_reserve_extent+0x5c9)[0x4425df]
btrfs(btrfs_alloc_free_block+0x63)[0x44297c]
btrfs(__btrfs_cow_block+0xfc)[0x436636]
btrfs(btrfs_cow_block+0x8b)[0x436bd8]
btrfs[0x43ad82]
btrfs(btrfs_commit_transaction+0xb8)[0x43c5dc]
btrfs[0x42c0d4]
btrfs(main+0x12f)[0x40a341]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fb2f16a82b1]
btrfs(_start+0x2a)[0x40a37a]

(I didn´t expect much as this is an issue that AFAIK does not happen
easily anymore, but I also thought it could not do much harm)

Superblocks themselves seem to be sane:

merkaba:~#1> btrfs rescue super-recover //dev/satafp1/daten
All supers are valid, no need to recover

So "btrfs restore" it is:

merkaba:[…]> btrfs restore -mxs /dev/satafp1/daten daten-restore

This prints out a ton of:

Trying another mirror
Trying another mirror

But it actually works. Somewhat, I now just got

Trying another mirror
We seem to be looping a lot on 
daten-restore/[…]/virtualbox-4.1.18-dfsg/out/lib/vboxsoap.a, do you want to 
keep going on ? (y/N/a):

after about 35 GiB of data restored. I answered no to this one and now it is
at about 53 GiB already. I just got another one of these, but also not 
concerning a file I actually need.

Thanks,

-- 
Martin Steigerwald  | Trainer

teamix GmbH
Südwestpark 43
90449 Nürnberg

Tel.:  +49 911 30999 55 | Fax: +49 911 30999 99
mail: martin.steigerw...@teamix.de | web:  http://www.teamix.de | blog: 
http://blog.teamix.de

Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller

teamix Support Hotline: +49 911 30999-112
 
 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix ***

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: stability matrix

2016-09-15 Thread Martin Steigerwald

Am Donnerstag, 15. September 2016, 07:54:26 CEST schrieb Austin S. Hemmelgarn:
> On 2016-09-15 05:49, Hans van Kranenburg wrote:
> > On 09/15/2016 04:14 AM, Christoph Anton Mitterer wrote:
[…]
> I specifically do not think we should worry about distro kernels though.
>   If someone is using a specific distro, that distro's documentation
> should cover what they support and what works and what doesn't.  Some
> (like Arch and to a lesser extent Gentoo) use almost upstream kernels,
> so there's very little point in tracking them.  Some (like Ubuntu and
> Debian) use almost upstream LTS kernels, so there's little point
> tracking them either.  Many others though (like CentOS, RHEL, and OEL)
> Use forked kernels that have so many back-ported patches that it's
> impossible to track up-date to up-date what the hell they've got.  A
> rather ridiculous expression regarding herding of cats comes to mind
> with respect to the last group.

Yep. I just read through RHEL releasenotes for a RHEL 7 workshop I will hold 
for a customer… and noted that newer RHEL 7 kernels for example have device 
mapper from Kernel 4.1 (while the kernel still says its a 3.10 one), XFS from 
kernel this.that, including new incompat CRC disk format and the need to also 
upgrade xfsprogs in lockstep, and this and that from kernel this.that and so 
on. Frankenstein as an association comes to my mind, but I bet RHEL kernel 
engineers know what they are doing.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-15 Thread Martin Steigerwald

Am Donnerstag, 15. September 2016, 07:55:36 CEST schrieb Kai Krakow:
> Am Mon, 12 Sep 2016 08:20:20 -0400
> 
> schrieb "Austin S. Hemmelgarn" <ahferro...@gmail.com>:
> > On 2016-09-11 09:02, Hugo Mills wrote:
> > > On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote:
> > >> Martin Steigerwald wrote:
> >  [...]
> >  [...]
> >  [...]
> >  [...]
> >  
> > >> That is exactly the same reason I don't edit the wiki myself. I
> > >> could of course get it started and hopefully someone will correct
> > >> what I write, but I feel that if I start this off I don't have deep
> > >> enough knowledge to do a proper start. Perhaps I will change my
> > >> mind about this.
> > >> 
> > >Given that nobody else has done it yet, what are the odds that
> > > 
> > > someone else will step up to do it now? I would say that you should
> > > at least try. Yes, you don't have as much knowledge as some others,
> > > but if you keep working at it, you'll gain that knowledge. Yes,
> > > you'll probably get it wrong to start with, but you probably won't
> > > get it *very* wrong. You'll probably get it horribly wrong at some
> > > point, but even the more knowledgable people you're deferring to
> > > didn't identify the problems with parity RAID until Zygo and Austin
> > > and Chris (and others) put in the work to pin down the exact
> > > issues.
> > 
> > FWIW, here's a list of what I personally consider stable (as in, I'm
> > willing to bet against reduced uptime to use this stuff on production
> > systems at work and personal systems at home):
> > 1. Single device mode, including DUP data profiles on single device
> > without mixed-bg.
> > 2. Multi-device raid0, raid1, and raid10 profiles with symmetrical
> > devices (all devices are the same size).
> > 3. Multi-device single profiles with asymmetrical devices.
> > 4. Small numbers (max double digit) of snapshots, taken at infrequent
> > intervals (no more than once an hour).  I use single snapshots
> > regularly to get stable images of the filesystem for backups, and I
> > keep hourly ones of my home directory for about 48 hours.
> > 5. Subvolumes used to isolate parts of a filesystem from snapshots.
> > I use this regularly to isolate areas of my filesystems from backups.
> > 6. Non-incremental send/receive (no clone source, no parent's, no
> > deduplication).  I use this regularly for cloning virtual machines.
> > 7. Checksumming and scrubs using any of the profiles I've listed
> > above. 8. Defragmentation, including autodefrag.
> > 9. All of the compat_features, including no-holes and skinny-metadata.
> > 
> > Things I consider stable enough that I'm willing to use them on my
> > personal systems but not systems at work:
> > 1. In-line data compression with compress=lzo.  I use this on my
> > laptop and home server system.  I've never had any issues with it
> > myself, but I know that other people have, and it does seem to make
> > other things more likely to have issues.
> > 2. Batch deduplication.  I only use this on the back-end filesystems
> > for my personal storage cluster, and only because I have multiple
> > copies as a result of GlusterFS on top of BTRFS.  I've not had any
> > significant issues with it, and I don't remember any reports of data
> > loss resulting from it, but it's something that people should not be
> > using if they don't understand all the implications.
> 
> I could at least add one "don't do it":
> 
> Don't use BFQ patches (it's an IO scheduler) if you're using btrfs.
> Some people like to use it especially for running VMs and desktops
> because it provides very good interactivity while maintaining very good
> throughput. But it completely destroyed my btrfs beyond repair at least
> twice, either while actually using a VM (in VirtualBox) or during high
> IO loads. I now stick to the deadline scheduler instead which provides
> very good interactivity for me, too, and the corruptions didn't occur
> again so far.
> 
> The story with BFQ has always been the same: System suddenly freezes
> during moderate to high IO until all processes stop working (no process
> shows D state, tho). Only hard reboot possible. After rebooting, access
> to some (unrelated) files may fail with "errno=-17 Object already
> exists" which cannot be repaired. If it affects files needed during
> boot, you are screwed because file system goes RO.

This could be a further row in the table. And well…

as for CFQ Jens Axboe currently works on bandwidth throttling patches 
*exactly* for the reason to provide more interactivity and fairness to I/O 
operations in between.

Right now, Completely Fair in CFQ is a *huge* exaggeration, at least while you 
have a dd bs=1M thing running.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-15 Thread Martin Steigerwald

Hello Nicholas.

Am Mittwoch, 14. September 2016, 21:05:52 CEST schrieb Nicholas D Steeves:
> On Mon, Sep 12, 2016 at 08:20:20AM -0400, Austin S. Hemmelgarn wrote:
> > On 2016-09-11 09:02, Hugo Mills wrote:
[…]
> > As far as documentation though, we [BTRFS] really do need to get our act
> > together.  It really doesn't look good to have most of the best
> > documentation be in the distro's wikis instead of ours.  I'm not trying to
> > say the distros shouldn't be documenting BTRFS, but the point at which
> > Debian (for example) has better documentation of the upstream version of
> > BTRFS than the upstream project itself does, that starts to look bad.
> 
> I would have loved to have this feature-to-stability list when I
> started working on the Debian documentation!  I started it because I
> was saddened by number of horror story "adventures with btrfs"
> articles and posts I had read about, combined with the perspective of
> certain members within the Debian community that it was a toy fs.
> 
> Are my contributions to that wiki of a high enough quality that I
> can work on the upstream one?  Do you think the broader btrfs
> community is interested in citations and curated links to discussions?
> 
> eg: if a company wants to use btrfs, they check the status page, see a
> feature they want is still in the yellow zone of stabilisation, and
> then follow the links to familiarise themselves with past discussions.
> I imagine this would also help individuals or grad students more
> quickly familiarise themselves with the available literature before
> choosing a specific project.  If regular updates from SUSE, STRATO,
> Facebook, and Fujitsu are also publicly available the k.org wiki would
> be a wonderful place to syndicate them!

 I definately think the quality of your contributions is high enough, others 
can also proofread and give in their experiences, so… By *all* means, go ahead 
*already*.

It doesn´t fit all inside the table directly, I bet, *but* you can use 
footnotes or further explainations regarding features that need them with a 
headline per feature below the table and a link to it from within the table.

Thank you!
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke? (wiki updated)

2016-09-13 Thread Martin Steigerwald

Am Dienstag, 13. September 2016, 07:28:38 CEST schrieb Austin S. Hemmelgarn:
> On 2016-09-12 16:44, Chris Murphy wrote:
> > On Mon, Sep 12, 2016 at 2:35 PM, Martin Steigerwald <mar...@lichtvoll.de> 
wrote:
> >> Am Montag, 12. September 2016, 23:21:09 CEST schrieb Pasi Kärkkäinen:
> >>> On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote:
> >>>> Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba:
> >>>>> On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:
[…]
> >>>>> https://btrfs.wiki.kernel.org/index.php/Status
> >>>> 
> >>>> Great.
> >>>> 
> >>>> I made to minor adaption. I added a link to the Status page to my
> >>>> warning
> >>>> in before the kernel log by feature page. And I also mentioned that at
> >>>> the time the page was last updated the latest kernel version was 4.7.
> >>>> Yes, thats some extra work to update the kernel version, but I think
> >>>> its
> >>>> beneficial to explicitely mention the kernel version the page talks
> >>>> about. Everyone who updates the page can update the version within a
> >>>> second.
> >>> 
> >>> Hmm.. that will still leave people wondering "but I'm running Linux 4.4,
> >>> not 4.7, I wonder what the status of feature X is.."
> >>> 
> >>> Should we also add a column for kernel version, so we can add "feature X
> >>> is
> >>> known to be OK on Linux 3.18 and later"..  ? Or add those to "notes"
> >>> field,
> >>> where applicable?
> >> 
> >> That was my initial idea, and it may be better than a generic kernel
> >> version for all features. Even if we fill in 4.7 for any of the features
> >> that are known to work okay for the table.
> >> 
> >> For RAID 1 I am willing to say it works stable since kernel 3.14, as this
> >> was the kernel I used when I switched /home and / to Dual SSD RAID 1 on
> >> this ThinkPad T520.
> > 
> > Just to cut yourself some slack, you could skip 3.14 because it's EOL
> > now, and just go from 4.4.
> 
> That reminds me, we should probably make a point to make it clear that
> this is for the _upstream_ mainline kernel versions, not for versions
> from some arbitrary distro, and that people should check the distro's
> documentation for that info.

I´d do the following:

Really state the first known to work stable kernel version for a feature.

But before the table state this:

1) Instead of the first known to work stable kernel for a feature recommend to 
use the latest upstream kernel or alternatively the latest upstream LTS kernel 
for those users who want to play it a bit safer.

2) For stable distros such as  SLES, RHEL, Ubuntu LTS, Debian Stable recommend 
to check distro documentation. Note that some distro kernels track upstream 
kernels quite closely like Debian backport kernel or Ubuntu kernel backports 
PPA.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Martin Steigerwald

Am Montag, 12. September 2016, 23:21:09 CEST schrieb Pasi Kärkkäinen:
> On Mon, Sep 12, 2016 at 09:57:17PM +0200, Martin Steigerwald wrote:
> > Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba:
> > > On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:
> > > > > I therefore would like to propose that some sort of feature /
> > > > > stability
> > > > > matrix for the latest kernel is added to the wiki preferably
> > > > > somewhere
> > > > > where it is easy to find. It would be nice to archive old matrix'es
> > > > > as
> > > > > well in case someone runs on a bit older kernel (we who use Debian
> > > > > tend
> > > > > to like older kernels). In my opinion it would make things bit
> > > > > easier
> > > > > and perhaps a bit less scary too. Remember if you get bitten badly
> > > > > once
> > > > > you tend to stay away from from it all just in case, if you on the
> > > > > other
> > > > > hand know what bites you can safely pet the fluffy end instead :)
> > > > 
> > > > Somebody has put that table on the wiki, so it's a good starting
> > > > point.
> > > > I'm not sure we can fit everything into one table, some combinations
> > > > do
> > > > not bring new information and we'd need n-dimensional matrix to get
> > > > the
> > > > whole picture.
> > > 
> > > https://btrfs.wiki.kernel.org/index.php/Status
> > 
> > Great.
> > 
> > I made to minor adaption. I added a link to the Status page to my warning
> > in before the kernel log by feature page. And I also mentioned that at
> > the time the page was last updated the latest kernel version was 4.7.
> > Yes, thats some extra work to update the kernel version, but I think its
> > beneficial to explicitely mention the kernel version the page talks
> > about. Everyone who updates the page can update the version within a
> > second.
> 
> Hmm.. that will still leave people wondering "but I'm running Linux 4.4, not
> 4.7, I wonder what the status of feature X is.."
> 
> Should we also add a column for kernel version, so we can add "feature X is
> known to be OK on Linux 3.18 and later"..  ? Or add those to "notes" field,
> where applicable?

That was my initial idea, and it may be better than a generic kernel version 
for all features. Even if we fill in 4.7 for any of the features that are 
known to work okay for the table.

For RAID 1 I am willing to say it works stable since kernel 3.14, as this was 
the kernel I used when I switched /home and / to Dual SSD RAID 1 on this 
ThinkPad T520.


-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke? (wiki updated)

2016-09-12 Thread Martin Steigerwald

Am Montag, 12. September 2016, 18:27:47 CEST schrieb David Sterba:
> On Mon, Sep 12, 2016 at 04:27:14PM +0200, David Sterba wrote:
> > > I therefore would like to propose that some sort of feature / stability
> > > matrix for the latest kernel is added to the wiki preferably somewhere
> > > where it is easy to find. It would be nice to archive old matrix'es as
> > > well in case someone runs on a bit older kernel (we who use Debian tend
> > > to like older kernels). In my opinion it would make things bit easier
> > > and perhaps a bit less scary too. Remember if you get bitten badly once
> > > you tend to stay away from from it all just in case, if you on the other
> > > hand know what bites you can safely pet the fluffy end instead :)
> > 
> > Somebody has put that table on the wiki, so it's a good starting point.
> > I'm not sure we can fit everything into one table, some combinations do
> > not bring new information and we'd need n-dimensional matrix to get the
> > whole picture.
> 
> https://btrfs.wiki.kernel.org/index.php/Status

Great.

I made to minor adaption. I added a link to the Status page to my warning in 
before the kernel log by feature page. And I also mentioned that at the time 
the page was last updated the latest kernel version was 4.7. Yes, thats some 
extra work to update the kernel version, but I think its beneficial to 
explicitely mention the kernel version the page talks about. Everyone who 
updates the page can update the version within a second.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Small fs

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 19:46:32 CEST schrieb Hugo Mills:
> On Sun, Sep 11, 2016 at 09:13:28PM +0200, Martin Steigerwald wrote:
> > Am Sonntag, 11. September 2016, 16:44:23 CEST schrieb Duncan:
> > > * Metadata, and thus mixed-bg, defaults to DUP mode on a single-device
> > > filesystem (except on ssd where I actually still use it myself, and
> > > recommend it except for ssds that do firmware dedupe).  In mixed-mode
> > > this means two copies of data as well, which halves the usable space.
> > > 
> > > IOW, when using mixed-mode, which is recommended under a gig, and dup
> > > replication which is then the single-device default, effective usable
> > > space is **HALVED**, so 256 MiB btrfs size becomes 128 MiB usable. (!!)
> > 
> > I don´t get this part. That is just *metadata* being duplicated, not the
> > actual *data* inside the files. Or am I missing something here?
> 
>In mixed mode, there's no distinction: Data and metadata both use
> the same chunks. If those chunks are DUP, then both data and metadata
> are duplicated, and you get half the space available.

In german I´d say "autsch", in english according to pda.leo.org "ouch", to 
this.

Okay, I just erased using mixed mode as an idea from my mind altogether :).

Just like I think I will never use a BTRFS below 5 GiB. Well, with one 
exception, maybe on the eMMC flash of the new Omnia Turris router that I hope 
will arrive soon at my place.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

compress=lzo safe to use? (was: Re: Trying to rescue my data :()

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 26. Juni 2016, 13:13:04 CEST schrieb Steven Haigh:
> On 26/06/16 12:30, Duncan wrote:
> > Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:
> >> In every case, it was a flurry of csum error messages, then instant
> >> death.
> > 
> > This is very possibly a known bug in btrfs, that occurs even in raid1
> > where a later scrub repairs all csum errors.  While in theory btrfs raid1
> > should simply pull from the mirrored copy if its first try fails checksum
> > (assuming the second one passes, of course), and it seems to do this just
> > fine if there's only an occasional csum error, if it gets too many at
> > once, it *does* unfortunately crash, despite the second copy being
> > available and being just fine as later demonstrated by the scrub fixing
> > the bad copy from the good one.
> > 
> > I'm used to dealing with that here any time I have a bad shutdown (and
> > I'm running live-git kde, which currently has a bug that triggers a
> > system crash if I let it idle and shut off the monitors, so I've been
> > getting crash shutdowns and having to deal with this unfortunately often,
> > recently).  Fortunately I keep my root, with all system executables, etc,
> > mounted read-only by default, so it's not affected and I can /almost/
> > boot normally after such a crash.  The problem is /var/log and /home
> > (which has some parts of /var that need to be writable symlinked into /
> > home/var, so / can stay read-only).  Something in the normal after-crash
> > boot triggers enough csum errors there that I often crash again.
> > 
> > So I have to boot to emergency mode and manually mount the filesystems in
> > question, so nothing's trying to access them until I run the scrub and
> > fix the csum errors.  Scrub itself doesn't trigger the crash, thankfully,
> > and once it has repaired all the csum errors due to partial writes on one
> > mirror that either were never made or were properly completed on the
> > other mirror, I can exit emergency mode and complete the normal boot (to
> > the multi-user default target).  As there's no more csum errors then
> > because scrub fixed them all, the boot doesn't crash due to too many such
> > errors, and I'm back in business.
> > 
> > 
> > Tho I believe at least the csum bug that affects me may only trigger if
> > compression is (or perhaps has been in the past) enabled.  Since I run
> > compress=lzo everywhere, that would certainly affect me.  It would also
> > explain why the bug has remained around for quite some time as well,
> > since presumably the devs don't run with compression on enough for this
> > to have become a personal itch they needed to scratch, thus its remaining
> > untraced and unfixed.
> > 
> > So if you weren't using the compress option, your bug is probably
> > different, but either way, the whole thing about too many csum errors at
> > once triggering a system crash sure does sound familiar, here.
> 
> Yes, I was running the compress=lzo option as well... Maybe here lays a
> common problem?

Hmm… I found this from being referred to by reading Debian wiki page on 
BTRFS¹.

I use compress=lzo on BTRFS RAID 1 since April 2014 and I never found an 
issue. Steven, your filesystem wasn´t RAID 1 but RAID 5 or 6?

I just want to assess whether using compress=lzo might be dangerous to use in 
my setup. Actually right now I like to keep using it, since I think at least 
one of the SSDs does not compress. And… well… /home and / where I use it are 
both quite full already.

[1] https://wiki.debian.org/Btrfs#WARNINGS

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Small fs

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 21:56:07 CEST schrieb Imran Geriskovan:
> On 9/11/16, Duncan <1i5t5.dun...@cox.net> wrote:
> > Martin Steigerwald posted on Sun, 11 Sep 2016 17:32:44 +0200 as excerpted:
> >>> What is the smallest recommended fs size for btrfs?
> >>> Can we say size should be in multiples of 64MB?
> >> 
> >> Do you want to know the smalled *recommended* or the smallest *possible*
> >> size?
> 
> In fact both.
> I'm reconsidering my options for /boot

Well my stance on boot still is: Ext4. Done.

:)

It just does not bother me. It practically makes no difference at all. It has 
no visible effect on my user experience and I never saw the need to snapshot /
boot.

But another approach in case you want to use BTRFS for /boot is to use a 
subvolume. Thats IMHO the SLES 12 default setup. They basically create 
subvolumes for /boot, /var, /var/lib/mysql – you name it. Big advantage: You 
have one big FS and do not need to plan space for partitions or LVs. 
Disadvantage: If it breaks, it breaks.

That said, I think at a new installation I may do this for /boot. Just put it 
inside a subvolume.

>From my experiences at work with customer systems and even some systems I 
setup myself, I often do not use little partitions anymore. I did so for a 
CentOS 7 training VM, just 2 GiB XFS for /var. Guess what happened? Last 
update was too long ago, so… yum tried to download a ton of packages and then 
complained it has not enough space in /var. Luckily I used LVM, so I enlarged 
partition LVM resides on, enlarged PV and then enlarged /var. There may be 
valid reasons to split things up, and I am quite comfortable with splitting /
boot out, cause its, well, plannable easily enough. And it may make sense to 
split /var or /var/log out. But on BTRFS I would likely use subvolumes. Only 
thing I may separate would be /home to make it easier on a re-installation of 
the OS to keep it around. That said, I never ever reinstalled the Debian on 
this ThinkPad T520 since I initially installed it. And on previous laptops I 
even copied the Debian on the older laptop onto the newer laptop. With the 
T520 I reinstalled, cause I wanted to switch to 64 bit cleanly.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Small fs

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 16:44:23 CEST schrieb Duncan:
> * Metadata, and thus mixed-bg, defaults to DUP mode on a single-device 
> filesystem (except on ssd where I actually still use it myself, and 
> recommend it except for ssds that do firmware dedupe).  In mixed-mode 
> this means two copies of data as well, which halves the usable space.
> 
> IOW, when using mixed-mode, which is recommended under a gig, and dup 
> replication which is then the single-device default, effective usable 
> space is **HALVED**, so 256 MiB btrfs size becomes 128 MiB usable. (!!)

I don´t get this part. That is just *metadata* being duplicated, not the 
actual *data* inside the files. Or am I missing something here?

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Small fs

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 18:27:30 CEST schrieben Sie:
> What is the smallest recommended fs size for btrfs?
> 
> - There are mentions of 256MB around the net.
> - Gparted reserves minimum of 256MB for btrfs.
> 
> With an ordinary partition on a single disk,
> fs created with just "mkfs.btrfs /dev/sdxx":
> - 128MB works fine.
> - 127MB works but as if it is 64MB.
> 
> Can we say size should be in multiples of 64MB?

Do you want to know the smalled *recommended* or the smallest *possible* size?

I personally wouldn´t go below one or two GiB or or so with BTRFS. On small 
filesystems, I don´t know the treshold right now it uses a mixed metadata/data 
format. And I think using smaller BTRFS filesystem invited any left over 
"filesystem is full while it isn´t" issues.

Well there we go. Excerpt from mkfs.btrfs(8) manpage:

   -M|--mixed
   Normally the data and metadata block groups are isolated.
   The mixed mode will remove the isolation and store both
   types in the same block group type. This helps to utilize
   the free space regardless of the purpose and is suitable
   for small devices. The separate allocation of block groups
   leads to a situation where the space is reserved for the
   other block group type, is not available for allocation and
   can lead to ENOSPC state.

   The recommended size for the mixed mode is for filesystems
   less than 1GiB. The soft recommendation is to use it for
   filesystems smaller than 5GiB. The mixed mode may lead to
   degraded performance on larger filesystems, but is
   otherwise usable, even on multiple devices.

   The nodesize and sectorsize must be equal, and the block
   group types must match.

   Note
   versions up to 4.2.x forced the mixed mode for devices
   smaller than 1GiB. This has been removed in 4.3+ as it
   caused some usability issues.

Thanks
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 16:54:25 CEST schrieben Sie:
> Am Sonntag, 11. September 2016, 14:39:14 CEST schrieb Waxhead:
> > Martin Steigerwald wrote:
> > > Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin 
Steigerwald:
> > >>>>> The Nouveau graphics driver have a nice feature matrix on it's
> > >>>>> webpage
> > >>>>> and I think that BTRFS perhaps should consider doing something like
> > >>>>> that
> > >>>>> on it's official wiki as well
> > >>>> 
> > >>>> BTRFS also has a feature matrix. The links to it are in the "News"
> > >>>> section
> > >>>> however:
> > >>>> 
> > >>>> https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature
> 
> […]
> 
> > > I mentioned this matrix as a good *starting* point. And I think it would
> > > be
> > > easy to extent it:
> > > 
> > > Just add another column called "Production ready". Then research / ask
> > > about production stability of each feature. The only challenge is: Who
> > > is
> > > authoritative on that? I´d certainly ask the developer of a feature, but
> > > I´d also consider user reports to some extent.
> > > 
> > > Maybe thats the real challenge.
> > > 
> > > If you wish, I´d go through each feature there and give my own
> > > estimation.
> > > But I think there are others who are deeper into this.
> > 
> > That is exactly the same reason I don't edit the wiki myself. I could of
> > course get it started and hopefully someone will correct what I write,
> > but I feel that if I start this off I don't have deep enough knowledge
> > to do a proper start. Perhaps I will change my mind about this.
> 
> Well one thing would be to start with the column and start filling the more
> easy stuff. And if its not known since what kernel version, but its known to
> be stable I suggest to conservatively just put the first kernel version
> into it where people think it is stable or in doubt even put 4.7 into it.
> It can still be reduced to lower kernel versions.
> 
> Well: I made a tiny start. I linked "Features by kernel version" more
> prominently on the main page, so it is easier to find and also added the
> following warning just above the table:
> 
> "WARNING: The "Version" row states at which version a feature has been
> merged into the mainline kernel. It does not tell anything about at which
> kernel version it is considered mature enough for production use."
> 
> Now I wonder: Would adding a "Production ready" column, stating the first
> known to be stable kernel version make sense in this table? What do you
> think? I can add the column and give some first rough, conservative
> estimations on a few features.
> 
> What do you think? Is this a good place?

It isn´t as straight forward to add this column as I thought. If I add it 
after "Version" then the following fields are not aligned anymore, even tough 
they use some kind of identifier – but that identifier also doesn´t match the 
row title. After reading about mediawiki syntax I came to the conclusion that 
I need to add the new column in every data row as well and cannot just assign 
values to the rows and leave out whats not known yet.

! Feature !! Version !! Description !! Notes
{{FeatureMerged
|name=scrub
|version=3.0
|text=Read all data and verify checksums, repair if possible.
}}

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 13:02:21 CEST schrieb Hugo Mills:
> On Sun, Sep 11, 2016 at 02:39:14PM +0200, Waxhead wrote:
> > Martin Steigerwald wrote:
> > >Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin Steigerwald:
> > >>>>Thing is: This just seems to be when has a feature been implemented
> > >>>>matrix.
> > >>>>Not when it is considered to be stable. I think this could be done
> > >>>>with
> > >>>>colors or so. Like red for not supported, yellow for implemented and
> > >>>>green for production ready.
> > >>>
> > >>>Exactly, just like the Nouveau matrix. It clearly shows what you can
> > >>>expect from it.
> > >
> > >I mentioned this matrix as a good *starting* point. And I think it would
> > >be
> > >easy to extent it:
> > >
> > >Just add another column called "Production ready". Then research / ask
> > >about production stability of each feature. The only challenge is: Who
> > >is authoritative on that? I´d certainly ask the developer of a feature,
> > >but I´d also consider user reports to some extent.
> > >
> > >Maybe thats the real challenge.
> > >
> > >If you wish, I´d go through each feature there and give my own
> > >estimation. But I think there are others who are deeper into this.
> > 
> > That is exactly the same reason I don't edit the wiki myself. I
> > could of course get it started and hopefully someone will correct
> > what I write, but I feel that if I start this off I don't have deep
> > enough knowledge to do a proper start. Perhaps I will change my mind
> > about this.
> 
>Given that nobody else has done it yet, what are the odds that
> someone else will step up to do it now? I would say that you should at
> least try. Yes, you don't have as much knowledge as some others, but
> if you keep working at it, you'll gain that knowledge. Yes, you'll
> probably get it wrong to start with, but you probably won't get it
> *very* wrong. You'll probably get it horribly wrong at some point, but
> even the more knowledgable people you're deferring to didn't identify
> the problems with parity RAID until Zygo and Austin and Chris (and
> others) put in the work to pin down the exact issues.
> 
>So I'd strongly encourage you to set up and maintain the stability
> matrix yourself -- you have the motivation at least, and the knowledge
> will come with time and effort. Just keep reading the mailing list and
> IRC and bugzilla, and try to identify where you see lots of repeated
> problems, and where bugfixes in those areas happen.
> 
>So, go for it. You have a lot to offer the community.

Yep! Fully agreed.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 14:39:14 CEST schrieb Waxhead:
> Martin Steigerwald wrote:
> > Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin Steigerwald:
> >>>>> The Nouveau graphics driver have a nice feature matrix on it's webpage
> >>>>> and I think that BTRFS perhaps should consider doing something like
> >>>>> that
> >>>>> on it's official wiki as well
> >>>> 
> >>>> BTRFS also has a feature matrix. The links to it are in the "News"
> >>>> section
> >>>> however:
> >>>> 
> >>>> https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature
[…]
> > I mentioned this matrix as a good *starting* point. And I think it would
> > be
> > easy to extent it:
> > 
> > Just add another column called "Production ready". Then research / ask
> > about production stability of each feature. The only challenge is: Who is
> > authoritative on that? I´d certainly ask the developer of a feature, but
> > I´d also consider user reports to some extent.
> > 
> > Maybe thats the real challenge.
> > 
> > If you wish, I´d go through each feature there and give my own estimation.
> > But I think there are others who are deeper into this.
> 
> That is exactly the same reason I don't edit the wiki myself. I could of
> course get it started and hopefully someone will correct what I write,
> but I feel that if I start this off I don't have deep enough knowledge
> to do a proper start. Perhaps I will change my mind about this.

Well one thing would be to start with the column and start filling the more 
easy stuff. And if its not known since what kernel version, but its known to 
be stable I suggest to conservatively just put the first kernel version into 
it where people think it is stable or in doubt even put 4.7 into it. It can 
still be reduced to lower kernel versions.

Well: I made a tiny start. I linked "Features by kernel version" more 
prominently on the main page, so it is easier to find and also added the 
following warning just above the table:

"WARNING: The "Version" row states at which version a feature has been merged 
into the mainline kernel. It does not tell anything about at which kernel 
version it is considered mature enough for production use."

Now I wonder: Would adding a "Production ready" column, stating the first 
known to be stable kernel version make sense in this table? What do you think? 
I can add the column and give some first rough, conservative estimations on a 
few features.

What do you think? Is this a good place?

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 14:30:51 CEST schrieb Waxhead:
> > I think what would be a good next step would be to ask developers / users
> > about feature stability and then update the wiki. If thats important to
> > you, I suggest you invest some energy in doing that. And ask for help.
> > This mailinglist is a good idea.
> > 
> > I already gave you my idea on what works for me.
> > 
> > There is just one thing I won´t go further even a single step: The
> > complaining path. As it leads to no desirable outcome.
> > 
> > Thanks,
> 
> My intention was not to be hostile and if my response sound a bit harsh 
> for you then by all means I do apologize for that.

Okay, maybe I read something into your mail that you didn´t intend to put 
there. Sorry. Let us focus on the constructive way to move forward with this.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 13:43:59 CEST schrieb Martin Steigerwald:
> > >> The Nouveau graphics driver have a nice feature matrix on it's webpage
> > >> and I think that BTRFS perhaps should consider doing something like
> > >> that
> > >> on it's official wiki as well
> > > 
> > > BTRFS also has a feature matrix. The links to it are in the "News"
> > > section
> > > however:
> > > 
> > > https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature
> > 
> > I disagree, this is not a feature / stability matrix. It is a clearly a
> > changelog by kernel version.
> 
> It is a *feature* matrix. I fully said its not about stability, but about 
> implementation – I just wrote this a sentence after this one. There is no
> need  whatsoever to further discuss this as I never claimed that it is a
> feature / stability matrix in the first place.
> 
> > > Thing is: This just seems to be when has a feature been implemented
> > > matrix.
> > > Not when it is considered to be stable. I think this could be done with
> > > colors or so. Like red for not supported, yellow for implemented and
> > > green for production ready.
> > 
> > Exactly, just like the Nouveau matrix. It clearly shows what you can
> > expect from it.

I mentioned this matrix as a good *starting* point. And I think it would be 
easy to extent it:

Just add another column called "Production ready". Then research / ask about 
production stability of each feature. The only challenge is: Who is 
authoritative on that? I´d certainly ask the developer of a feature, but I´d 
also consider user reports to some extent.

Maybe thats the real challenge.

If you wish, I´d go through each feature there and give my own estimation. But 
I think there are others who are deeper into this.

I do think for example that scrubbing and auto raid repair are stable, except 
for RAID 5/6. Also device statistics and RAID 0 and 1 I consider to be stable. 
I think RAID 10 is also stable, but as I do not run it, I don´t know. For me 
also skinny-metadata is stable. For me so far even compress=lzo seems to be 
stable, but well for others it may not.

Since what kernel version? Now, there you go. I have no idea. All I know I 
started BTRFS with Kernel 2.6.38 or 2.6.39 on my laptop, but not as RAID 1 at 
that time.

See, the implementation time of a feature is much easier to assess. Maybe 
thats part of the reason why there is not stability matrix: Maybe no one 
*exactly* knows *for sure*. How could you? So I would even put a footnote on 
that "production ready" row explaining "Considered to be stable by developer 
and user oppinions".

Of course additionally it would be good to read about experiences of corporate 
usage of BTRFS. I know at least Fujitsu, SUSE, Facebook, Oracle are using it. 
But I don´t know in what configurations and with what experiences. One Oracle 
developer invests a lot of time to bring BTRFS like features to XFS and RedHat 
still favors XFS over BTRFS, even SLES defaults to XFS for /home and other non 
/-filesystems. That also tells a story.

Some ideas you can get from SUSE releasenotes. Even if you do not want to use 
it, it tells something and I bet is one of the better sources of information 
regarding your question you can get at this time. Cause I believe SUSE 
developers invested some time to assess the stability of features. Cause they 
would carefully assess what they can support in enterprise environments. There 
is also someone from Fujitsu who shared experiences in a talk, I can search 
the URL to the slides again.

I bet Chris Mason and other BTRFS developers at Facebook have some idea on 
what they use within Facebook as well. To what extent they are allowed to talk 
about it… I don´t know. My personal impression is that as soon as Chris went 
to Facebook he became quite quiet. Maybe just due to being busy. Maybe due to 
Facebook being concerned much more about the privacy of itself than of its 
users.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is stability a joke?

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 13:21:30 CEST schrieb Zoiled:
> Martin Steigerwald wrote:
> > Am Sonntag, 11. September 2016, 10:55:21 CEST schrieb Waxhead:
> >> I have been following BTRFS for years and have recently been starting to
> >> use BTRFS more and more and as always BTRFS' stability is a hot topic.
> >> Some says that BTRFS is a dead end research project while others claim
> >> the opposite.
> > 
> > First off: On my systems BTRFS definately runs too stable for a research
> > project. Actually: I have zero issues with stability of BTRFS on *any* of
> > my systems at the moment and in the last half year.
> > 
> > The only issue I had till about half an year ago was BTRFS getting stuck
> > at
> > seeking free space on a highly fragmented RAID 1 + compress=lzo /home.
> > This
> > went away with either kernel 4.4 or 4.5.
> > 
> > Additionally I never ever lost even a single byte of data on my own BTRFS
> > filesystems. I had a checksum failure on one of the SSDs, but BTRFS RAID 1
> > repaired it.
> > 
> > 
> > Where do I use BTRFS?
> > 
> > 1) On this ThinkPad T520 with two SSDs. /home and / in RAID 1, another
> > data
> > volume as single. In case you can read german, search blog.teamix.de for
> > BTRFS.
> > 
> > 2) On my music box ThinkPad T42 for /home. I did not bother to change / so
> > far and may never to so for this laptop. It has a slow 2,5 inch harddisk.
> > 
> > 3) I used it on Workstation at work as well for a data volume in RAID 1.
> > But workstation is no more (not due to a filesystem failure).
> > 
> > 4) On a server VM for /home with Maildirs and Owncloud data. /var is still
> > on Ext4, but I want to migrate it as well. Whether I ever change /, I
> > don´t know.
> > 
> > 5) On another server VM, a backup VM which I currently use with
> > borgbackup.
> > With borgbackup I actually wouldn´t really need BTRFS, but well…
> > 
> > 6) On *all* of my externel eSATA based backup harddisks for snapshotting
> > older states of the backups.
> 
> In other words, you are one of those who claim the opposite :) I have
> also myself run btrfs for a "toy" filesystem since 2013 without any
> issues, but this is more or less irrelevant since some people have
> experienced data loss thanks to unstable features that are not clearly
> marked as such.
> And making a claim that you have not lost a single byte of data does not
> make sense, how did you test this? SHA256 against a backup? :)

Do you have any proof like that with *any* other filesystem on Linux?

No, my claim is a bit weaker: BTRFS own scrubbing feature and well no I/O 
errors on rsyncing my data over to the backup drive - BTRFS checks checksum on 
read as well –, and yes I know BTRFS uses a weaker hashing algorithm, I think 
crc32c. Yet this is still more than what I can say about *any* other 
filesystem I used so far. Up to my current knowledge neither XFS nor Ext4/3 
provide data checksumming. They do have metadata checksumming and I found 
contradicting information on whether XFS may support data checksumming in the 
future, but up to now, no *proof* *whatsoever* from side of the filesystem 
that the data is, what it was when I saved it initially. There may be bit 
errors rotting on any of your Ext4 and XFS filesystem without you even 
noticing for *years*. I think thats still unlikely, but it can happen, I have 
seen this years ago after restoring a backup with bit errors from a hardware 
RAID controller.

Of course, I rely on the checksumming feature within BTRFS – which may have 
errors. But even that is more than with any other filesystem I had before.

And I do not scrub daily, especially not the backup disks, but for any scrubs 
up to now, no issues. So, granted, my claim has been a bit bold. Right now I 
have no up-to-this-day scrubs so all I can say is that I am not aware of any 
data losses up to the point in time where I last scrubbed my devices. Just 
redoing the scrubbing now on my laptop.

> >> The Debian wiki for BTRFS (which is recent by the way) contains a bunch
> >> of warnings and recommendations and is for me a bit better than the
> >> official BTRFS wiki when it comes to how to decide what features to use.
> > 
> > Nice page. I wasn´t aware of this one.
> > 
> > If you use BTRFS with Debian, I suggest to usually use the recent backport
> > kernel, currently 4.6.
> > 
> > Hmmm, maybe I better remove that compress=lzo mount option. Never saw any
> > issue with it, tough. Will research what they say about it.
> 
> My point exactly: You did not know about this and hence the risk of your
> data being gnawed on.

Well I do follow B

Re: Is stability a joke?

2016-09-11 Thread Martin Steigerwald

Am Sonntag, 11. September 2016, 10:55:21 CEST schrieb Waxhead:
> I have been following BTRFS for years and have recently been starting to
> use BTRFS more and more and as always BTRFS' stability is a hot topic.
> Some says that BTRFS is a dead end research project while others claim
> the opposite.

First off: On my systems BTRFS definately runs too stable for a research 
project. Actually: I have zero issues with stability of BTRFS on *any* of my 
systems at the moment and in the last half year.

The only issue I had till about half an year ago was BTRFS getting stuck at 
seeking free space on a highly fragmented RAID 1 + compress=lzo /home. This 
went away with either kernel 4.4 or 4.5.

Additionally I never ever lost even a single byte of data on my own BTRFS 
filesystems. I had a checksum failure on one of the SSDs, but BTRFS RAID 1 
repaired it.


Where do I use BTRFS?

1) On this ThinkPad T520 with two SSDs. /home and / in RAID 1, another data 
volume as single. In case you can read german, search blog.teamix.de for 
BTRFS.

2) On my music box ThinkPad T42 for /home. I did not bother to change / so far 
and may never to so for this laptop. It has a slow 2,5 inch harddisk.

3) I used it on Workstation at work as well for a data volume in RAID 1. But 
workstation is no more (not due to a filesystem failure).

4) On a server VM for /home with Maildirs and Owncloud data. /var is still on 
Ext4, but I want to migrate it as well. Whether I ever change /, I don´t know.

5) On another server VM, a backup VM which I currently use with borgbackup. 
With borgbackup I actually wouldn´t really need BTRFS, but well…

6) On *all* of my externel eSATA based backup harddisks for snapshotting older 
states of the backups.

> The Debian wiki for BTRFS (which is recent by the way) contains a bunch
> of warnings and recommendations and is for me a bit better than the
> official BTRFS wiki when it comes to how to decide what features to use.

Nice page. I wasn´t aware of this one.

If you use BTRFS with Debian, I suggest to usually use the recent backport 
kernel, currently 4.6.

Hmmm, maybe I better remove that compress=lzo mount option. Never saw any 
issue with it, tough. Will research what they say about it.

> The Nouveau graphics driver have a nice feature matrix on it's webpage
> and I think that BTRFS perhaps should consider doing something like that
> on it's official wiki as well

BTRFS also has a feature matrix. The links to it are in the "News" section 
however:

https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature

Thing is: This just seems to be when has a feature been implemented matrix. 
Not when it is considered to be stable. I think this could be done with colors 
or so. Like red for not supported, yellow for implemented and green for 
production ready.

Another hint you can get by reading SLES 12 releasenotes. SUSE dares to 
support BTRFS since quite a while – frankly, I think for SLES 11 SP 3 this was 
premature, at least for the initial release without updates, I have a VM that 
with BTRFS I can break very easily having BTRFS say it is full, while it is 
has still 2 GB free. But well… this still seems to happen for some people 
according to the threads on BTRFS mailing list.

SUSE doesn´t support all of BTRFS. They even put features they do not support 
behind a "allow_unsupported=1" module option:

https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12/#fate-314697

But they even seem to contradict themselves by claiming they support RAID 0, 
RAID 1 and RAID10, but not RAID 5 or RAID 6, but then putting RAID behind that 
module option – or I misunderstood their RAID statement

"Btrfs is supported on top of MD (multiple devices) and DM (device mapper) 
configurations. Use the YaST partitioner to achieve a proper setup. 
Multivolume Btrfs is supported in RAID0, RAID1, and RAID10 profiles in SUSE 
Linux Enterprise 12, higher RAID levels are not yet supported, but might be 
enabled with a future service pack."

and they only support BTRFS on MD for RAID. They also do not support 
compression yet. They even do not support big metadata.

https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12/#fate-317221

Interestingly enough RedHat only supports BTRFS as a technology preview, even 
with RHEL 7.

> For example something along the lines of  (the statuses are taken
> our of thin air just for demonstration purposes)

I´d say feel free to work with the feature matrix already there and fill in 
information about stability. I think it makes sense tough to discuss first on 
how to do it with still keeping it manageable.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kworker threads may be working saner now instead of using 100% of a CPU core for minutes (Re: Still not production ready)

2016-09-07 Thread Martin Steigerwald

Am Mittwoch, 7. September 2016, 11:53:04 CEST schrieb Christian Rohmann:
> On 03/20/2016 12:24 PM, Martin Steigerwald wrote:
> >> btrfs kworker thread uses up 100% of a Sandybridge core for minutes on
> >> 
> >> > random write into big file
> >> > https://bugzilla.kernel.org/show_bug.cgi?id=90401
> > 
> > I think I saw this up to kernel 4.3. I think I didn´t see this with 4.4
> > anymore and definately not with 4.5.
> > 
> > So it may be fixed.
> > 
> > Did anyone else see kworker threads using 100% of a core for minutes with
> > 4.4 / 4.5?
> 
> I run 4.8rc5 and currently see this issue. kworking has been running at
> 100% for hours now, seems stuck there.
> 
> Anything I should look at in order to narrow this down to a root cause?

I didn´t see any issues since my last post, currently running 4.8-rc5 myself.

I suggest you look at kernel log and probably review this thread and my bug 
report for what other information I came up with. Particulary in my case the 
issue only happened when BTRFS allocated all device spaces into chunks, but 
the space in the chunks was not fully used up yet. I.e. when BTRFS had to seek 
for new space in chunks and couldn´t just allocate a new chunk anymore. In 
addition to that your BTRFS configuration, storage configuration, yada. Just 
review what I reported to get an idea.

If you are sufficiently sure that your issue is the same from looking at the 
kernel log… so if the backtraces look sufficiently similar, then I´d add to my 
bug report. Otherwise I´d hope a new one.

Good luck.
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: dd on wrong device, 1.9 GiB from the beginning has been overwritten, how to restore partition?

2016-06-12 Thread Martin Steigerwald

Hi Maximilian,

On Sonntag, 12. Juni 2016 23:22:11 CEST Maximilian Böhm wrote:
> Hi there, I did something terribly wrong, all blame on me. I wanted to
> write to an USB stick but /dev/sdc wasn't the stick in this case but
> an attached HDD with GPT and an 8 TB btrfs partition…
> 
> $ sudo dd bs=4M if=manjaro-kde-16.06.1-x86_64.iso of=/dev/sdc
> 483+1 Datensätze ein
> 483+1 Datensätze aus
> 2028060672 bytes (2,0 GB, 1,9 GiB) copied, 16,89 s, 120 MB/s
> 
> So, shit.
> 
> $ sudo btrfs check --repair /dev/sdc
> enabling repair mode
> No valid Btrfs found on /dev/sdc
> Couldn't open file system
> 
> $ sudo btrfs-find-root /dev/sdc
> No valid Btrfs found on /dev/sdc
> ERROR: open ctree failed
> 
> $ sudo btrfs-show-super /dev/sdc --all
> superblock: bytenr=65536, device=/dev/sdc
> -
> ERROR: bad magic on superblock on /dev/sdc at 65536
> 
> superblock: bytenr=67108864, device=/dev/sdc
> -
> ERROR: bad magic on superblock on /dev/sdc at 67108864
> 
> superblock: bytenr=274877906944, device=/dev/sdc
> -
> ERROR: bad magic on superblock on /dev/sdc at 274877906944
> 
> 
> System infos:
> 
> $ uname -a
> Linux Mongo 4.6.2-1-MANJARO #1 SMP PREEMPT Wed Jun 8 11:00:08 UTC 2016
> x86_64 GNU/Linux
> 
> $ btrfs --version
> btrfs-progs v4.5.3
> 
> Don't think dmesg is necessary here.
> 
> 
> OK, the btrfs wiki says there is a second superblock at 64 MiB
> (overwritten too in my case) and a third at 256 GiB ("0x40").
> But how to restore it? And how to restore the general btrfs header
> metadata? How to restore GPT without doing something terrible again?
> Would be glad for any help!

But it says bad magic on that one as well.

Well, no idea if there is any chance to fix BTRFS in this situation.

Does btrfs restore do anything useful to copy of what if can find from this 
device? It does not work in place, so you need an additional space to let it 
restore to.

If BTRFS cannot be salvaged… you can still have a go with photorec, but it 
will not recover filenames and directory structure, just the data of any file 
of a file in a known format that it finds in one piece.

I suspect you have no backup.

So *good* luck.

I do think tough that dd should just bail out or warn for a BTRFS filesystem 
that is still mounted, or wasn´t it mounted at that time?

I also think it would be good to add an existing filesystem check just like in 
mkfs.btrfs, mkfs.xfs and so on. I´d like that, but that would be a suggestions 
for the coreutils people.

Yes, Unix is for people who know what they are doing… unless they don´t. And 
in the end even one of the most experienced admin could make such a mistake.

Goodnight,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFE: 'btrfs' tools machine readable output

2016-05-16 Thread Martin Steigerwald

Hello Richard,

On Montag, 16. Mai 2016 13:14:56 CEST Richard W.M. Jones wrote:
> I don't have time to implement this right now, so I'm just posting
> this as a suggestion/request ...
> 
> It would be really helpful if the btrfs tools had a machine-readable
> output.
> 
> Libguestfs parses btrfs tools output in a number of places, eg:
> https://github.com/libguestfs/libguestfs/blob/master/daemon/btrfs.c
> This is a massive PITA because each time a new release of btrfs-progs
> comes along it changes the output slightly, and we end up having
> to add all sorts of hacks.
> 
> With machine-readable output, there'd be a flag which would
> change the output.  eg:

I wonder whether parsing a text based output is really the most elegant method 
here.

How about a libbtrfs so that other tools can benefit from btrfs tools 
functionality? This was also desktop environments wishing to make use of 
snapshot functionality or advanced disk usage reporting for example can easily 
make use of it without calling external commands.

Of course it would likely me more effort than to implement structured output.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs send/receive using generation number as source

2016-04-08 Thread Martin Steigerwald

On Freitag, 8. April 2016 11:12:54 CEST Hugo Mills wrote:
> On Fri, Apr 08, 2016 at 01:01:03PM +0200, Martin Steigerwald wrote:
> > Hello!
> > 
> > As far as I understood, for differential btrfs send/receive – I didn´t use
> > it yet – I need to keep a snapshot on the source device to then tell
> > btrfs send to send the differences between the snapshot and the current
> > state.
> > 
> > Now the BTRFS filesystems on my SSDs are often quite full, thus I do not
> > keep any snapshots except for one during rsync or borgbackup script
> > run-time.
> > 
> > Is it possible to tell btrfs send to use generation number xyz to
> > calculate
> > the difference? This way, I wouldn´t have to keep a snapshot around, I
> > believe.
> 
>btrfs sub find-new
> 
>BUT that will only tell you which files have been added or updated.
> It won't tell you which files have been deleted. It's also unrelated
> to send/receive, so you'd have to roll your own solution.

I am aware of this one.

> > I bet not, at the time cause -c wants a snapshot. Ah and it wants a
> > snapshot of the same state on the destination as well. Well on the
> > destination I let the script make a snapshot after the backup so…
> > what I would need is to remember the generation number of the source
> > snapshot that the script creates to backup from and then tell btrfs
> > send that generation number + the destination snapshots.
> > 
> > Well, or get larger SSDs or get rid of some data on them.
> 
>Those are the other options, of course.

Hm, I see.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs send/receive using generation number as source

2016-04-08 Thread Martin Steigerwald

Hello!

As far as I understood, for differential btrfs send/receive – I didn´t use it 
yet – I need to keep a snapshot on the source device to then tell btrfs send 
to send the differences between the snapshot and the current state.

Now the BTRFS filesystems on my SSDs are often quite full, thus I do not keep 
any snapshots except for one during rsync or borgbackup script run-time.

Is it possible to tell btrfs send to use generation number xyz to calculate 
the difference? This way, I wouldn´t have to keep a snapshot around, I 
believe.

I bet not, at the time cause -c wants a snapshot. Ah and it wants a snapshot 
of the same state on the destination as well. Well on the destination I let 
the script make a snapshot after the backup so… what I would need is to 
remember the generation number of the source snapshot that the script creates 
to backup from and then tell btrfs send that generation number + the 
destination snapshots.

Well, or get larger SSDs or get rid of some data on them.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: csum errors in VirtualBox VDI files

2016-03-27 Thread Martin Steigerwald

On Dienstag, 22. März 2016 09:03:42 CEST Kai Krakow wrote:
> Hello!
> 
> Since one of the last kernel updates (I don't know which exactly), I'm
> experiencing csum errors within VDI files when running VirtualBox. A
> side effect of this is, as soon as dmesg shows these errors, commands
> like "du" and "df" hang until reboot.
> 
> I've now restored the file from backup but it happens over and over
> again.

Just as another data point I am irregularily using a VM with Virtualbox in a 
VDI file on a BTRFS RAID 1 on two SSDs and had no such issues so far up to 
kernel 4.5.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: New file system with same issue (was: Again, no space left on device while rebalancing and recipe doesnt work)

2016-03-27 Thread Martin Steigerwald

On Dienstag, 15. März 2016 08:07:22 CEST Marc Haber wrote:
> On Mon, Mar 14, 2016 at 09:39:51PM +0100, Henk Slager wrote:
> > >> BTW, I restored and mounted your 20160307-fanbtr-image:
> > >> 
> > >> [266169.207952] BTRFS: device label fanbtr devid 1 transid 22215732
> > >> /dev/loop0 [266203.734804] BTRFS info (device loop0): disk space
> > >> caching is enabled [266203.734806] BTRFS: has skinny extents
> > >> [266204.022175] BTRFS: checking UUID tree
> > >> [266239.407249] attempt to access beyond end of device
> > >> [266239.407252] loop0: rw=1073, want=715202688, limit=70576
> > >> [266239.407254] BTRFS error (device loop0): bdev /dev/loop0 errs: wr
> > >> 1, rd 0, flush 0, corrupt 0, gen 0
> > >> [266239.407272] attempt to access beyond end of device
> > >> .. and 16 more
> > >> 
> > >> As a quick fix/workaround, I truncated the image to 1T
> > > 
> > > The original fs was 417 GiB in size. What size does the image claim?
> > 
> > ls -alFh  of the restored image showed 337G I remember.
> > btrfs fi us showed also a number over 400G, I don't have the
> > files/loopdev anymore.
> 
> sounds legit.
> 
> > It could some side effect of btrfs-image, I only have used it for
> > multi-device, where dev id's are ignore, but total image size did not
> > lead to problems.
> 
> The original "ofanbtr" seems to have a problem, since btrfs check
> 
> /media/tempdisk says:
> > > [10/509]mh@fan:~$ sudo btrfs check /media/tempdisk/
> > > Superblock bytenr is larger than device size
> > > Couldn't open file system
> > > [11/509]mh@fan:~$
> > > 
> > > Can this be fixed?
> > 
> > What I would do in order to fix it, is resize the fs to let's say
> > 190GiB. That should write correct values to the superblocks I /hope/.
> > And then resize back to max.
> 
> It doesn't:
> [20/518]mh@fan:~$ sudo btrfs filesystem resize 300G /media/tempdisk/
> Resize '/media/tempdisk/' of '300G'
> [22/520]mh@fan:~$ sudo btrfs check /media/tempdisk/
> Superblock bytenr is larger than device size
> Couldn't open file system
> [23/521]mh@fan:~$ df -h

Are you trying the check on the *mounted* filesystem? "media/tempdisk" appears 
to be a mount point, not a device file.

Unmount it and use the / one device file of the filesystem to check.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unable to mount btrfs partition, please help :(

2016-03-20 Thread Martin Steigerwald

On Sonntag, 20. März 2016 10:18:26 CET Patrick Tschackert wrote:
> > I think in retrospect the safe way to do these kinds of Virtual Box
> > updates, which require kernel module updates, would have been to
> > shutdown the VM and stop the array. *shrug*
> 
>  
> After this, I think I'll just do away with the virtual machine on this host,
> as the app contained in that vm can also run on the host. I tried to be
> fancy, and it seems to needlessly complicate things.

I am not completely sure and I have no exact reference anymore, but I think I 
read more than once about fs benchmarks running faster in Virtualbox than on 
the physical system, which may point at an at least incomplete fsync() 
implementation for writing into Virtualbox image files.

I never found any proof of this nor did I specificially seeked to research it. 
So it may be true or not.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unable to mount btrfs partition, please help :(

2016-03-20 Thread Martin Steigerwald

On Samstag, 19. März 2016 19:34:55 CET Chris Murphy wrote:
> >>> $ uname -a
> >>> Linux vmhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4
> >>> (2016-02-29) x86_64 GNU/Linux
> >>
> >>This is old. You should upgrade to something newer, ideally 4.5 but
> >>4.4.6 is good also, and then oldest I'd suggest is 4.1.20.
> >>
> > Shouldn't I be able to get the newest kernel by executing "apt-get update
> > && apt-get dist-upgrade"? That's what I ran just now, and it doesn't
> > install a newer kernel. Do I really have to manually upgrade to a newer
> > one?
> I'm not sure. You might do a list search for debian, as I know debian
> users are using newer kernels that they didn't build themselves.

Try a backport¹ kernel. Add backports and do 

apt-cache search linux-image 

I use 4.3 backport kernel successfully on two server VMs which use BTRFS.

[1] http://backports.debian.org/

Thx,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Experimental btrfs encryption

2016-03-20 Thread Martin Steigerwald

On Mittwoch, 2. März 2016 09:06:57 CET Qu Wenruo wrote:
> And maybe I just missed something, but the filename seems not touched, 
> meaning it will leak a lot of information.
> Just like default eCryptfs behavior.
> 
> I understand that's an easy design and it's not a high priority thing, 
> but I hope we can encrypt the subvolume tree blocks too, if using 
> per-subvolume policy.
> To provide a feature near block-level encryption.

I´d really love an approach to at least optionally be able to hide the 
metadata structure completely except for which blocks on the block device are 
allocated. I.e. not just encrypting filenames, but encrypting the directory 
structure, amount of files, their dates, their sizes. I am not sure whether 
BTRFS can allow this and still be at least btrfs check´able without unlocking 
the encryption key. Ideally this could even be backuped by an btrfs send/
receive as a kind opaque stream.

This would excel BTRFS encryption support over anything thats available with 
Ext4, F2FS, ecryptfs and encfs. It would ideal for having encryption on SSD, 
no need to encrypted unallocated blocks, but still most of the advantages of 
block level encryption, even of some would argue that you can find something 
out when you check which blocks are allocated or not, and of course the total 
size of the subvolume and which chunks it allocates are known.

I would the this as requirement for any initial approach and be happy about 
anything that does file name encryption like ecryptfs or the Ext4/F2FS 
approach, but if the subvolume specifics of BTRFS can be used to encrypted 
more of the metadata then even better!

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kworker threads may be working saner now instead of using 100% of a CPU core for minutes (Re: Still not production ready)

2016-03-20 Thread Martin Steigerwald

On Sonntag, 13. Dezember 2015 23:35:08 CET Martin Steigerwald wrote:
> Hi!
> 
> For me it is still not production ready. Again I ran into:
> 
> btrfs kworker thread uses up 100% of a Sandybridge core for minutes on
> random write into big file
> https://bugzilla.kernel.org/show_bug.cgi?id=90401

I think I saw this up to kernel 4.3. I think I didn´t see this with 4.4 
anymore and definately not with 4.5.

So it may be fixed.

Did anyone else see kworker threads using 100% of a core for minutes with 4.4 
/ 4.5?


For me this would be a big step forward. And yes, I am aware some people have 
new and other issues, but well for me a non working balance – it may also be 
broken here with "no space left on device", it errored out often enough here – 
is still something different than having to switch off the device hard unless 
you want to give it a ton of time to eventually shutdown which is not an 
option if you just want to work with your system.


In any case many thanks to all the developers working on improving BTRFS, and 
especially those who bring in bug fixes. I do think BTRFS still needs more 
stability work when I read through the recent mailing list threads.

Thanks,
Martin

> No matter whether SLES 12 uses it as default for root, no matter whether
> Fujitsu and Facebook use it: I will not let this onto any customer machine
> without lots and lots of underprovisioning and rigorous free space
> monitoring. Actually I will renew my recommendations in my trainings to be
> careful with BTRFS.
> 
> From my experience the monitoring would check for:
> 
> merkaba:~> btrfs fi show /home
> Label: 'home'  uuid: […]
> Total devices 2 FS bytes used 156.31GiB
> devid1 size 170.00GiB used 164.13GiB path /dev/mapper/msata-home
> devid2 size 170.00GiB used 164.13GiB path /dev/mapper/sata-home
> 
> If "used" is same as "size" then make big fat alarm. It is not sufficient
> for it to happen. It can run for quite some time just fine without any
> issues, but I never have seen a kworker thread using 100% of one core for
> extended period of time blocking everything else on the fs without this
> condition being met.
> 
> 
> In addition to that last time I tried it aborts scrub any of my BTRFS
> filesstems. Reported in another thread here that got completely ignored so
> far. I think I could go back to 4.2 kernel to make this work.
> 
> 
> I am not going to bother to go into more detail on any on this, as I get the
> impression that my bug reports and feedback get ignored. So I spare myself
> the time to do this work for now.
> 
> 
> Only thing I wonder now whether this all could be cause my /home is already
> more than one and a half year old. Maybe newly created filesystems are
> created in a way that prevents these issues? But it already has a nice
> global reserve:
> 
> merkaba:~> btrfs fi df /
> Data, RAID1: total=27.98GiB, used=24.07GiB
> System, RAID1: total=19.00MiB, used=16.00KiB
> Metadata, RAID1: total=2.00GiB, used=536.80MiB
> GlobalReserve, single: total=192.00MiB, used=0.00B
> 
> 
> Actually when I see that this free space thing is still not fixed for good I
> wonder whether it is fixable at all. Is this an inherent issue of BTRFS or
> more generally COW filesystem design?
> 
> I think it got somewhat better. It took much longer to come into that state
> again than last time, but still, blocking like this is *no* option for a
> *production ready* filesystem.
> 
> 
> 
> I am seriously consider to switch to XFS for my production laptop again.
> Cause I never saw any of these free space issues with any of the XFS or
> Ext4 filesystems I used in the last 10 years.
> 
> Thanks,


-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Again, no space left on device while rebalancing and recipe doesnt work

2016-02-27 Thread Martin Steigerwald

On Samstag, 27. Februar 2016 22:14:50 CET Marc Haber wrote:
> Hi,

Hi Marc.

> I have again the issue of no space left on device while rebalancing
> (with btrfs-tools 4.4.1 on kernel 4.4.2 on Debian unstable):
> 
> mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
> ERROR: error during balancing '/mnt/fanbtr': No space left on device
> mh@fan:~$ sudo btrfs fi show /mnt/fanbtr
> mh@fan:~$ sudo btrfs fi show -m
> Label: 'fanbtr'  uuid: 4198d1bc-e3ce-40df-a7ee-44a2d120bff3
> Total devices 1 FS bytes used 116.49GiB
> devid1 size 417.19GiB used 177.06GiB path /dev/mapper/fanbtr

Hmmm, thats still a ton of space to allocate chunks from.

> mh@fan:~$ sudo btrfs fi df /mnt/fanbtr
> Data, single: total=113.00GiB, used=112.77GiB
> System, DUP: total=32.00MiB, used=48.00KiB
> Metadata, DUP: total=32.00GiB, used=3.72GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> mh@fan:~$
> 
> The filesystem was recently resized from 300 GB to 420 GB.
> 
> Why does btrfs fi show /mnt/fanbtr not give any output? Wy does btrfs
> fi df /mnt/fanbtr say that my data space is only 113 GiB large?

Cause it is.

The "used" in "devid 1" line is btrfs fi sh is "data + 2x system + 2x metadata 
= 113 GiB + 2 * 32 GiB + 2 * 32 MiB, i.e. what amount of the size of the 
device is allocated for chunks.

The value one line above is what is allocated inside the chunks.

I.e. the line in "devid 1" is "total" of btrfs fi df summed up, and the line 
above is "used" in btrfs fi df summed up. And… with more devices you have more 
fun.

I suggest:

merkaba:~> btrfs fi usage -T /daten
Overall:
Device size: 235.00GiB
Device allocated:227.04GiB
Device unallocated:7.96GiB
Device missing:  0.00B
Used:225.84GiB
Free (estimated):  8.48GiB  (min: 8.48GiB)
Data ratio:   1.00
Metadata ratio:   1.00
Global reserve:  128.00MiB  (used: 0.00B)

 Data  Metadata  System  
Id Path  singlesinglesingle   Unallocated
-- - - -  ---
 1 /dev/dm-1 226.00GiB   1.01GiB 32.00MiB 7.96GiB
-- - - -  ---
   Total 226.00GiB   1.01GiB 32.00MiB 7.96GiB
   Used  225.48GiB 371.83MiB 48.00KiB 

as that is much clearer to read IMHO.

and

merkaba:~> btrfs device usage /daten   
/dev/dm-1, ID: 1
   Device size:   235.00GiB
   Data,single:   226.00GiB
   Metadata,single: 1.01GiB
   System,single:  32.00MiB
   Unallocated: 7.96GiB

(although thats include in the filesystem usage output)


Or for a BTRFS RAID 1:

merkaba:~> btrfs fi usage -T /home  
Overall:
Device size: 340.00GiB
Device allocated:340.00GiB
Device unallocated:2.00MiB
Device missing:  0.00B
Used:306.47GiB
Free (estimated): 14.58GiB  (min: 14.58GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

 Data  Metadata System  
Id Path  RAID1 RAID1RAID1Unallocated
-- - -   ---
 1 /dev/dm-0 163.94GiB  6.03GiB 32.00MiB 1.00MiB
 2 /dev/dm-3 163.94GiB  6.03GiB 32.00MiB 1.00MiB
-- - -   ---
   Total 163.94GiB  6.03GiB 32.00MiB 2.00MiB
   Used  149.36GiB  3.88GiB 48.00KiB


merkaba:~> btrfs device usage /home
/dev/dm-0, ID: 1
   Device size:   170.00GiB
   Data,RAID1:163.94GiB
   Metadata,RAID1:  6.03GiB
   System,RAID1:   32.00MiB
   Unallocated: 1.00MiB

/dev/dm-3, ID: 2
   Device size:   170.00GiB
   Data,RAID1:163.94GiB
   Metadata,RAID1:  6.03GiB
   System,RAID1:   32.00MiB
   Unallocated: 1.00MiB


(this is actually the situation asking for hung task trouble with kworker 
threads seeking for free space inside chunks, as no new chunks can be 
allocated, lets hope kernel 4.4 finally really has fixes for this)

> btrfs balance start -dusage=5 works up to -dusage=100:
> 
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 111 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 111 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 110 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start -dusage=100 /mnt/fanbtr
> Done, had to relocate 109 out of 179 chunks
> mh@fan:~$ sudo btrfs balance start /mnt/fanbtr
> ERROR: error during balancing '/mnt/fanbtr': No space left on device
> mh@fan:~$
> 
> What is going on here? How do I get away from here?

Others may have

Re: Use fast device only for metadata?

2016-02-07 Thread Martin Steigerwald

Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
> Am Sun, 07 Feb 2016 11:06:58 -0800
> 
> schrieb Nikolaus Rath :
> > Hello,
> > 
> > I have a large home directory on a spinning disk that I regularly
> > synchronize between different computers using unison. That takes ages,
> > even though the amount of changed files is typically small. I suspect
> > most if the time is spend walking through the file system and checking
> > mtimes.
> > 
> > So I was wondering if I could possibly speed-up this operation by
> > storing all btrfs metadata on a fast, SSD drive. It seems that
> > mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
> > file contents in single mode. However, I could not find a way to tell
> > btrfs to use a device *only* for metadata. Is there a way to do that?
> > 
> > Also, what is the difference between using "dup" and "raid1" for the
> > metadata?
> 
> You may want to try bcache. It will speedup random access which is
> probably the main cause for your slow sync. Unfortunately it requires
> you to reformat your btrfs partitions to add a bcache superblock. But
> it's worth the efforts.
> 
> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
> to typically 1.5-3 depending on how much data changed.

An alternative is using dm-cache, I think it doesn´t need to recreate the 
filesystem.

I wonder what happened to the VFS hot data tracking stuff patchset floating 
around here quite some time ago.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs-progs and btrfs(8) inconsistencies

2016-02-04 Thread Martin Steigerwald

Am Donnerstag, 4. Februar 2016, 09:57:54 CET schrieb Moviuro:
> > Although personally I like to let all the backward compatibility
> > things go hell, but that's definitely not how things work. :(
> > 
> > 2) End-user taste.
> > Some end-users like such info as feedback of success.
> > Of course other users like it act as silent as possible.
> 
> I'm pretty sure that's... not the case. Almost everything on GNU/Linux
> is silent. cd(1) is silent, cp(1) is silent, rm(1)...
> What they all have though is a -v|--verbose switch.

The various mkfs commands are not. Not one of them I know of. Additionally 
each one gives a different output.

pvcreate, vgcreate, lvcreate and as well as the remove commands and probably 
other LVM commands are not (no one could argue, that from their ideas they 
come from HP/UX, but thats a Unix as well):

merkaba:~> lvcreate -L 1G -n bla sata
  Logical volume "bla" created.

And I think, not testing right now, that also mdadm is not silent on creating 
a softraid.

So while I agree with you that regular shell commands (coreutils, util-linux) 
are silent on success usually this does not appear to be the case with storage 
related commands in GNU/Linux.

I don´t have a clear oppinion about it other than I´d like to see some 
standard too. coreutils / util-linux both them to have some kind of standard, 
although not necessarily the same standard I bet. And I am not sure whether it 
is documented somewhere.

-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs Check - "type mismatch with chunk"

2016-01-05 Thread Martin Steigerwald

Am Dienstag, 5. Januar 2016, 15:34:35 CET schrieb Duncan:
> Christoph Anton Mitterer posted on Sat, 02 Jan 2016 06:12:46 +0100 as
> 
> excerpted:
> > On Fri, 2015-12-25 at 08:06 +, Duncan wrote:
> >> I wasn't personally sure if 4.1 itself was affected or not, but the
> >> wiki says don't use 4.1.1 as it's broken with this bug, with the
> >> quick-fix in 4.1.2, so I /think/ 4.1 itself is fine.  A scan with a
> >> current btrfs check should tell you for sure.  But if you meant 4.1.1
> >> and only typed 4.1, then yes, better redo.
> > 
> > What exactly was that bug in 4.1.1 mkfs and how would one notice that
> > one suffers from it?
> > I created a number of personal filesystems that I use "productively" and
> > I'm not 100% sure during which version I've created them... :/
> >
> > 
> >
> > Is there some easy way to find out, like a fs creation time stamp??
> 
> I believe a current btrfs check will flag the errors, but can't fix them, 
> as the problem was in the filesystem creation and is simply too deep to 
> fix, so the bad filesystems must be wiped and recreated with a mkfs.btrfs 
> without the bug, to fix.

btrfs check from btrfs tools 4.3.1 on kernel 4.4-rc6 has not been able to fix 
these errors and I recreated the filesystem that had the errors. I think I 
mentioned it also in this thread.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs scrub failing

2016-01-03 Thread Martin Steigerwald

Am Sonntag, 3. Januar 2016, 17:33:03 CET schrieb John Center:
> Hi Martin,

Hi John,

> One thing I forgot, I did run btrfs-image & it appears to have successfully
> completed afaict. Do you think it would be useful to someone for future
> troubleshooting?

I leave that to the devs to decide. Maybe, if its not too large you can keep 
it for a while and ask the devs whether they want it. But, as it contains the 
type mismatch thing that they already know and fixed in mkfs.btrfs, if may not 
be of much use to them.

Thank you,
Martin

> 
> Thanks.
> 
> -John
> 
> Sent from my iPhone
> 
> > On Jan 3, 2016, at 5:06 AM, Martin Steigerwald <mar...@lichtvoll.de>
> > wrote:
> > 
> > Am Sonntag, 3. Januar 2016, 02:02:12 CET schrieb John Center:
> >> Hi Martin & Duncan,
> > 
> > Hi John,
> > 
> >> Since I had a backup of my data, I first ran "btrfs check -p" on the
> >> unmounted array.  It first found 3 parent transid errors:
> >> 
> >> root@ubuntu:~# btrfs check -p /dev/md126p2
> >> Checking filesystem on /dev/md126p2
> >> UUID: 9b5a6959-7df1-4455-a643-d369487d24aa
> >> parent transid verify failed on 97763328 wanted 33736296 found 181864
> >> ...
> >> Ignoring transid failure
> >> parent transid verify failed on 241287168 wanted 33554449 found 17
> >> ...
> >> Ignoring transid failure
> >> parent transid verify failed on 1016217600 wanted 33556071 found 1639
> >> ...
> >> Ignoring transid failure
> >> 
> >> Then a huge number of bad extent mismatches:
> >> 
> >> bad extent [29360128, 29376512), type mismatch with chunk
> >> bad extent [29376512, 29392896), type mismatch with chunk
> >> ...
> >> bad extent [1039947448320, 1039947464704), type mismatch with chunk
> >> bad extent [1039948005376, 1039948021760), type mismatch with chunk
> > 
> > Due to these I recommend you redo the BTRFS filesystem using your backup.
> > See the other thread where Duncan explained the situation that this may
> > be a sign of a filesystem corruption introduced by a faulty mkfs.btrfs
> > version.
> > 
> > I had this yesterday with one of my BTRFS filesystems and these type
> > mismatch things didn´t go away with btrfs check --repair from btrfs-tools
> > 4.3.1.
> > 
> > Also
> > 
> >> Next:
> >> 
> >> Couldn't find free space inode 1
> >> checking free space cache [o]
> >> parent transid verify failed on 241287168 wanted 33554449 found 17
> >> Ignoring transid failure
> >> checkingunresolved ref dir 418890 index 0 namelen 15 name umq-onetouch.ko
> >> filetype 1 errors 6, no dir index, no inode ref
> >> 
> >>unresolved ref dir 418890 index 8 namelen 15 name ums-onetouch.ko
> >> 
> >> filetype 1 errors 1, no dir item
> > 
> > the further errors and
> > 
> > […]
> > 
> >> Once it finished, I tried a recovery mount, which went ok.  Since I
> >> already
> >> had a backup of my data, I tried to run btrfs repair:
> >> […]
> >> Then it got stuck on the same error as before.  It appears to be a loop:
> >> 
> >> parent transid verify failed on 1016217600 wanted 33556071 found 1639
> >> Ignoring transid failure
> >> parent transid verify failed on 1016217600 wanted 33556071 found 1639
> >> Ignoring transid failure
> >> ...
> > 
> > […]
> > 
> >> It's been running this way for over an hour now, never moving on from the
> >> same errors & the same couple of files.  I'm going to let it run
> >> overnight,
> >> but I don't have a lot of confidence that it will ever exit this loop. 
> >> Any
> >> recommendations as what I should do next?
> > 
> > is a clear sign to me that it likely is more effective to just redo the
> > filesystem from scratch than trying to repair it with the limited
> > capabilities of current btrfs check command.
> > 
> > So when you have a good backup of your data and want to be confident of a
> > sound structure of the filesytem, redo it from scratch with latest
> > btrfs-tools 4.3.1.
> > 
> > Thats at least my take on this.
> > 
> > Thanks,


-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs scrub failing

2016-01-03 Thread Martin Steigerwald

Am Samstag, 2. Januar 2016, 18:27:16 CET schrieb John Center:
> Hi Martin,
> 
> > On Jan 2, 2016, at 6:41 AM, Martin Steigerwald <mar...@lichtvoll.de>
> 
> wrote:
> > Am Samstag, 2. Januar 2016, 11:35:51 CET schrieb Martin Steigerwald:
> >> Am Freitag, 1. Januar 2016, 20:04:43 CET schrieb John Center:
> >>> Hi Duncan,
> >>> 
> >>>> On Fri, Jan 1, 2016 at 12:05 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> >>> 
> >>>> John Center posted on Fri, 01 Jan 2016 11:41:20 -0500 as excerpted:
> >>> Where do I go from here?
> >> 
> >> These and the other errors point at an issue with the filesystem
> 
> structure.
> 
> >> As I never had to deal with that, I can only give generic advice:
> >> 
> >> 1) Use latest stable btrfs-progs.
> 
> I'm in the process of creating a live USB to boot with.  Since I'm running
> mdadm (imsm) I need to purge dmraid & install mdadm to assemble the drives
> first.  I also need to put the latest version of btrfs-progs on it, too.
>  (As a side note, things have been getting flaky with my workstation, so I
> guess I'm either going to fix this or rebuild it.  I have the data files
> backed up, it's just a pain to have to recreate the system again.)

I think you could even just run a GRML from USB stick and grab the sources via 
git clone and compile them there. Shouldn´t take long. I use:

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

which has 4.3.1.

> >> 2) Umount the filesystem and run
> >> 
> >> btrfs check (maybe with -p)
> >> 
> >> When it finds some errors, proceed with the following steps:
> >> 
> >> Without --repair or some of the other options that modify things it is
> 
> read
> 
> >> only.
> >> 
> >> 3) If you can still access all the files, first thing to do is: rsync or
> >> otherwise backup them all to a different location, before attempting
> >> anything to repair the issue.
> >> 
> >> 4) If you can´t access some files, you may try to use btrfs restore for
> >> restoring them.
> >> 
> >> 5) Then, if you made sure you have an up-to-date backup run
> >> 
> >> btrfs --repair
> > 
> > Before doing that, review:
> > 
> > https://btrfs.wiki.kernel.org/index.php/Btrfsck
> > 
> > to learn about other options.
> 
> Ok, so "btrfs check -p" first to understand how bad the filesystem is
> corrupted.
> Should I then try to do a recovery mount, or should I run "btrfs check
> --repair -p" to try & fix it?  I'm not sure what a "mount -o ro,recovery"
> does.

Well, if "btrfs check -p" confirms that something needs fixing, you make sure 
you have a working backup *first*, either by rsync or btrfs restore if some 
files are inaccesible, but from what I remember you are able to access all 
files?

Then I´d say give btrfs check --repair a try.

I don´t know much about that mount -o ro,recovery does but from what I 
gathered so far I thought it is only needed when you *can´t* mount the 
filesystem anymore.

> If I have to reformat & reinstall Ubuntu, are there any recommended
> mkfs.btrfs options I should use?  Something that might help prevent
> problems in the future? 

Lol. :)

As far as I see mkfs.btrfs from btrfs-tools 4.3.1 already sets extref and 
skinny-metadata as well as 16 KiB node and leaf size by default, so as I 
recreated my /daten BTRFS filesystem due to the extent type mismatch errors 
(see other thread) I used use mkfs.btrfs -L daten to set a label.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs scrub failing

2016-01-03 Thread Martin Steigerwald

Am Sonntag, 3. Januar 2016, 02:02:12 CET schrieb John Center:
> Hi Martin & Duncan,

Hi John,

> Since I had a backup of my data, I first ran "btrfs check -p" on the
> unmounted array.  It first found 3 parent transid errors:
> 
> root@ubuntu:~# btrfs check -p /dev/md126p2
> Checking filesystem on /dev/md126p2
> UUID: 9b5a6959-7df1-4455-a643-d369487d24aa
> parent transid verify failed on 97763328 wanted 33736296 found 181864
> ...
> Ignoring transid failure
> parent transid verify failed on 241287168 wanted 33554449 found 17
> ...
> Ignoring transid failure
> parent transid verify failed on 1016217600 wanted 33556071 found 1639
> ...
> Ignoring transid failure
> 
> Then a huge number of bad extent mismatches:
> 
> bad extent [29360128, 29376512), type mismatch with chunk
> bad extent [29376512, 29392896), type mismatch with chunk
> ...
> bad extent [1039947448320, 1039947464704), type mismatch with chunk
> bad extent [1039948005376, 1039948021760), type mismatch with chunk

Due to these I recommend you redo the BTRFS filesystem using your backup. See 
the other thread where Duncan explained the situation that this may be a sign 
of a filesystem corruption introduced by a faulty mkfs.btrfs version.

I had this yesterday with one of my BTRFS filesystems and these type mismatch 
things didn´t go away with btrfs check --repair from btrfs-tools 4.3.1.

Also

> Next:
> 
> Couldn't find free space inode 1
> checking free space cache [o]
> parent transid verify failed on 241287168 wanted 33554449 found 17
> Ignoring transid failure
> checkingunresolved ref dir 418890 index 0 namelen 15 name umq-onetouch.ko
> filetype 1 errors 6, no dir index, no inode ref
> unresolved ref dir 418890 index 8 namelen 15 name ums-onetouch.ko
> filetype 1 errors 1, no dir item

the further errors and

[…]
> Once it finished, I tried a recovery mount, which went ok.  Since I already
> had a backup of my data, I tried to run btrfs repair:
> […]
> Then it got stuck on the same error as before.  It appears to be a loop:
> 
> parent transid verify failed on 1016217600 wanted 33556071 found 1639
> Ignoring transid failure
> parent transid verify failed on 1016217600 wanted 33556071 found 1639
> Ignoring transid failure
> ...
[…]
> It's been running this way for over an hour now, never moving on from the
> same errors & the same couple of files.  I'm going to let it run overnight,
> but I don't have a lot of confidence that it will ever exit this loop.  Any
> recommendations as what I should do next?

is a clear sign to me that it likely is more effective to just redo the 
filesystem from scratch than trying to repair it with the limited capabilities 
of current btrfs check command.

So when you have a good backup of your data and want to be confident of a 
sound structure of the filesytem, redo it from scratch with latest btrfs-tools 
4.3.1.

Thats at least my take on this.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unrecoverable fs corruption?

2016-01-03 Thread Martin Steigerwald

Am Sonntag, 3. Januar 2016, 15:53:56 CET schrieben Sie:
> [1] Fat-fingering a deletion:  My own brown-bag "I became an admin that 
> day" case was running a script, unfortunately as root, that I was 
> debugging, where I did an rm -rf $somevar/*, with $somevar assigned 
> earlier, only either the somevar in the assignment or the somevar in the 
> rm line was typoed, so the var ended up empty and the command ended up as 
> rm -rf /*. ...
> 
> I was *SO* glad I had a backup, not just a raid1, that day!

Epic.

Thats the one case GNU rm doesn´t cover yet.

It refuses to rm -rf . or rm -rf .. and rm -rf / (unless you give special 
argument, but there is not much it can do about rm -r /*, as the shell expands 
this before handing it to the command.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs scrub failing

2016-01-02 Thread Martin Steigerwald

Am Freitag, 1. Januar 2016, 20:04:43 CET schrieb John Center:
> Hi Duncan,
> 
> On Fri, Jan 1, 2016 at 12:05 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> > John Center posted on Fri, 01 Jan 2016 11:41:20 -0500 as excerpted:
> >> If this doesn't resolve the problem, what would you recommend my next
> >> steps should be?  I've been hesitant to run too many of the btrfs-tools,
> >> mainly because I don't want to accidentally screw things up & I don't
> >> always know how to interpret the results. (I ran btrfs-debug-tree,
> >> hoping something obvious would show up.  Big mistake. )
> > 
> > LOLed at that debug-tree remark.  Been there (with other tools) myself.
> > 
> > Well, I'm hoping someone who had the problem can confirm whether it's
> > fixed in current kernels (scrub is one of those userspace commands that's
> > mostly just a front-end to the kernel code which does the real work, so
> > kernel version is the important thing for scrub).  I'm guessing so, and
> > that you'll find the problem gone in 4.3.
> > 
> > We'll cross the not-gone bridge if we get to it, but again, if the other
> > people who had the similar problem can confirm whether it disappeared for
> > them with the new kernel, it would help a lot, as there were enough such
> > reports that if it's the same problem and still there for everyone (which
> > I doubt as I expect there'd still be way more posts about it if so, but
> > confirmation's always good), nothing to do but wait for a fix, while if
> > not, and you still have your problem, then it's a different issue and the
> > devs will need to work with you on a fix specific to your problem.
> 
> Ok, I'm at the next bridge. :-(  I upgraded the kernel to 4.4rc7 from
> the Ubuntu Mainline archive & I just ran the scrub:
> 
> john@mariposa:~$ sudo /sbin/btrfs scrub start -BdR /dev/md125p2
> ERROR: scrubbing /dev/md125p2 failed for device id 1: ret=-1, errno=5
> (Input/output error)
> scrub device /dev/md125p2 (id 1) canceled
> scrub started at Fri Jan  1 19:38:21 2016 and was aborted after 00:02:34
> data_extents_scrubbed: 111031
> tree_extents_scrubbed: 104061
> data_bytes_scrubbed: 2549907456
> tree_bytes_scrubbed: 1704935424
> read_errors: 0
> csum_errors: 0
> verify_errors: 0
> no_csum: 1573
> csum_discards: 0
> super_errors: 0
> malloc_errors: 0
> uncorrectable_errors: 0
> unverified_errors: 0
> corrected_errors: 0
> last_physical: 4729667584
> 
> I checked dmesg & this appeared:
> 
> [11428.983355] BTRFS error (device md125p2): parent transid verify
> failed on 241287168 wanted 33554449 found 17
> [11431.028399] BTRFS error (device md125p2): parent transid verify
> failed on 241287168 wanted 33554449 found 17
> 
> Where do I go from here?

These and the other errors point at an issue with the filesystem structure.

As I never had to deal with that, I can only give generic advice:

1) Use latest stable btrfs-progs.

2) Umount the filesystem and run 

btrfs check (maybe with -p)

When it finds some errors, proceed with the following steps:

Without --repair or some of the other options that modify things it is read 
only.

3) If you can still access all the files, first thing to do is: rsync or 
otherwise backup them all to a different location, before attempting anything 
to repair the issue.

4) If you can´t access some files, you may try to use btrfs restore for 
restoring them.

5) Then, if you made sure you have an up-to-date backup run

btrfs --repair


Also watch out for other guidance you may receive her. My approach is based on 
what I would do. I never had the need to repair a BTRFS filesystem so far.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] BTRFS: Adds the files and options needed for Hybrid Storage

2016-01-02 Thread Martin Steigerwald

Hello,

Am Freitag, 1. Januar 2016, 22:08:32 CET schrieb Sanidhya Solanki:
> This patch adds the file required for Hybrid Storage. It contains
> the memory, time and size limits for the cache and the statistics that
> will be provided while the cache is operating.
> It also adds the Makefile changes needed to add the Hybrid Storage.

Is this about what I think it is – using flash as cache for a BTRFS filesystem 
on rotational disk? I ask cause the last time I saw patched regarding they 
consisted of patches to add hot data tracking to VFS and BTRFS to support 
setting up an SSD to use for hot data.

Or is this something different?

Happy New Year and thanks,
Martin

> Signed-off-by: Sanidhya Solanki 
> ---
>  fs/btrfs/Makefile |  2 +-
>  fs/btrfs/cache.c  | 58
> +++ 2 files changed, 59
> insertions(+), 1 deletion(-)
>  create mode 100644 fs/btrfs/cache.c
> 
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index 6d1d0b9..dc56ae4 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o
> root-tree.o dir-item.o \ export.o tree-log.o free-space-cache.o zlib.o
> lzo.o \
>  compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
>  reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
> -uuid-tree.o props.o hash.o
> +uuid-tree.o props.o hash.o cache.o
> 
>  btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
>  btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
> diff --git a/fs/btrfs/cache.c b/fs/btrfs/cache.c
> new file mode 100644
> index 000..0ece7a1
> --- /dev/null
> +++ b/fs/btrfs/cache.c
> @@ -0,0 +1,58 @@
> +/*
> + * (c) Sanidhya Solanki, 2016
> + *
> + * Licensed under the FSF's GNU Public License v2 or later.
> + */
> +#include 
> +
> +/* Cache size configuration )in MiB).*/
> +#define MAX_CACHE_SIZE = 1
> +#define MIN_CACHE_SIZE = 10
> +
> +/* Time (in seconds)before retrying to increase the cache size.*/
> +#define CACHE_RETRY = 10
> +
> +/* Space required to be free (in MiB) before increasing the size of the
> + * cache. If cache size is less than cache_grow_limit, a block will be
> freed + * from the cache to allow the cache to continue growning.
> + */
> +#define CACHE_GROW_LIMIT = 100
> +
> +/* Size required to be free (in MiB) after we shrink the cache, so that it
> + * does not grow in size immediately.
> + */
> +#define CACHE_SHRINK_FREE_SPACE_LIMIT = 100
> +
> +/* Age (in seconds) of oldest and newest block in the cache.*/
> +#define MAX_AGE_LIMIT = 300  /* Five Minute Rule recommendation,
> +  * optimum size depends on size of data
> +  * blocks.
> +  */
> +#define MIN_AGE_LIMIT = 15   /* In case of cache stampede.*/
> +
> +/* Memory constraints (in percentage) before we stop caching.*/
> +#define MIN_MEM_FREE = 10
> +
> +/* Cache statistics. */
> +struct cache_stats {
> + u64 cache_size;
> + u64 maximum_cache_size_attained;
> + int cache_hit_rate;
> + int cache_miss_rate;
> + u64 cache_evicted;
> + u64 duplicate_read;
> + u64 duplicate_write;
> + int stats_update_interval;
> +};
> +
> +#define cache_size   CACHE_SIZE /* Current cache size.*/
> +#define max_cache_size   MAX_SIZE /* Max cache limit. */
> +#define min_cache_size   MIN_SIZE /* Min cache limit.*/
> +#define cache_time   MAX_TIME /* Maximum time to keep data in cache.*/
> +#define evicted_csum EVICTED_CSUM/* Checksum of the evited data
> +  * (to avoid repeatedly caching
> +  * data that was just evicted.
> +  */
> +#define read_csumREAD_CSUM /* Checksum of the read data.*/
> +#define write_csum   WRITE_CSUM /* Checksum of the written data.*/
> +#define evict_interval   EVICT_INTERVAL /* Time to keep data before
> eviction.*/


-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs scrub failing

2016-01-02 Thread Martin Steigerwald

Am Samstag, 2. Januar 2016, 11:35:51 CET schrieb Martin Steigerwald:
> Am Freitag, 1. Januar 2016, 20:04:43 CET schrieb John Center:
> > Hi Duncan,
> > 
> > On Fri, Jan 1, 2016 at 12:05 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> > > John Center posted on Fri, 01 Jan 2016 11:41:20 -0500 as excerpted:
> > >> If this doesn't resolve the problem, what would you recommend my next
> > >> steps should be?  I've been hesitant to run too many of the
> > >> btrfs-tools,
> > >> mainly because I don't want to accidentally screw things up & I don't
> > >> always know how to interpret the results. (I ran btrfs-debug-tree,
> > >> hoping something obvious would show up.  Big mistake. )
> > > 
> > > LOLed at that debug-tree remark.  Been there (with other tools) myself.
> > > 
> > > Well, I'm hoping someone who had the problem can confirm whether it's
> > > fixed in current kernels (scrub is one of those userspace commands
> > > that's
> > > mostly just a front-end to the kernel code which does the real work, so
> > > kernel version is the important thing for scrub).  I'm guessing so, and
> > > that you'll find the problem gone in 4.3.
> > > 
> > > We'll cross the not-gone bridge if we get to it, but again, if the other
> > > people who had the similar problem can confirm whether it disappeared
> > > for
> > > them with the new kernel, it would help a lot, as there were enough such
> > > reports that if it's the same problem and still there for everyone
> > > (which
> > > I doubt as I expect there'd still be way more posts about it if so, but
> > > confirmation's always good), nothing to do but wait for a fix, while if
> > > not, and you still have your problem, then it's a different issue and
> > > the
> > > devs will need to work with you on a fix specific to your problem.
> > 
> > Ok, I'm at the next bridge. :-(  I upgraded the kernel to 4.4rc7 from
> > the Ubuntu Mainline archive & I just ran the scrub:
> > 
> > john@mariposa:~$ sudo /sbin/btrfs scrub start -BdR /dev/md125p2
> > ERROR: scrubbing /dev/md125p2 failed for device id 1: ret=-1, errno=5
> > (Input/output error)
> > scrub device /dev/md125p2 (id 1) canceled
> > scrub started at Fri Jan  1 19:38:21 2016 and was aborted after 00:02:34
> > data_extents_scrubbed: 111031
> > tree_extents_scrubbed: 104061
> > data_bytes_scrubbed: 2549907456
> > tree_bytes_scrubbed: 1704935424
> > read_errors: 0
> > csum_errors: 0
> > verify_errors: 0
> > no_csum: 1573
> > csum_discards: 0
> > super_errors: 0
> > malloc_errors: 0
> > uncorrectable_errors: 0
> > unverified_errors: 0
> > corrected_errors: 0
> > last_physical: 4729667584
> > 
> > I checked dmesg & this appeared:
> > 
> > [11428.983355] BTRFS error (device md125p2): parent transid verify
> > failed on 241287168 wanted 33554449 found 17
> > [11431.028399] BTRFS error (device md125p2): parent transid verify
> > failed on 241287168 wanted 33554449 found 17
> > 
> > Where do I go from here?
> 
> These and the other errors point at an issue with the filesystem structure.
> 
> As I never had to deal with that, I can only give generic advice:
> 
> 1) Use latest stable btrfs-progs.
> 
> 2) Umount the filesystem and run
> 
> btrfs check (maybe with -p)
> 
> When it finds some errors, proceed with the following steps:
> 
> Without --repair or some of the other options that modify things it is read
> only.
> 
> 3) If you can still access all the files, first thing to do is: rsync or
> otherwise backup them all to a different location, before attempting
> anything to repair the issue.
> 
> 4) If you can´t access some files, you may try to use btrfs restore for
> restoring them.
> 
> 5) Then, if you made sure you have an up-to-date backup run
> 
> btrfs --repair

Before doing that, review:

https://btrfs.wiki.kernel.org/index.php/Btrfsck

to learn about other options.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs Check - "type mismatch with chunk"

2016-01-02 Thread Martin Steigerwald

Am Donnerstag, 24. Dezember 2015, 23:41:06 CET schrieb Duncan:
> Zach Fuller posted on Thu, 24 Dec 2015 13:15:22 -0600 as excerpted:
> > I am currently running btrfs on a 2TB GPT drive. The drive is working
> > fine, still mounts correctly, and I have experienced no data corruption.
> > Whenever I run "btrfs check" on the drive, it returns 100,000+ messages
> > stating "bad extent [###, ###), type mismatch with chunk". Whenever I
> > try to run "btrfs check --repair" it says that it has fixed the errors,
> > but whenever I run "btrfs check" again, the errors return. Should I be
> > worried about data/filesystem corruption,
> > or are these errors meaningless?
> > 
> > I have my data backed up on 2 different drives, so I can afford to lose
> > the entire btrfs drive temporarily.
> > 
> > Here is some info about my system:
> > 
> > $ uname -[r]
> > 4.2.5-1-ARCH
> > 
> > 
> > $ btrfs --version
> > btrfs-progs v4.3.1
> 
> While Chris's reply mentioning a patch is correct, that's not the whole
> story and I suspect you have a problem, as the patch is in the userspace
> 4.3.1 you're running.
> 
> How long have you had the filesystem?  Was it likely created with the
> mkfs.btrfs from btrfs-progs v4.1.1 (July, 2015) as I suspect?  If so, you
> have a problem, as that mkfs.btrfs was buggy and created invalid
> filesystems.
> 
> As you have two separate backups and you're not experiencing corruption
> or the like so far, you should be fine, but if the filesystem was created
> with that buggy mkfs.btrfs, you need to wipe and recreate it as soon as
> possible, because it's unstable in its current state and could fail, with
> massive corruption, at any point.  Unfortunately, the bug created
> filesystems so broken that (last I knew anyway, and your experience
> agrees) there's no way btrfs check --repair can fix them.  The only way
> to fix it is to blow away the filesystem and recreate it with a
> mkfs.btrfs that doesn't have the bug that 4.1.1 did.  Your 4.3.1 should
> be fine.
> 
> (The patch Chris mentioned was to btrfs check, as the first set of
> patches to it to allow it to detect the problem triggered all sorts of
> false-positives and pretty much everybody was flagged as having the
> problem.  I believe that was patched in the 4.2 series, however, and
> you're running 4.3.1, so you should have that patch and the reports
> shouldn't be false positives.  Tho if you didn't create the filesystem
> with the buggy mkfs.btrfs from v4.1.1, there's likely some other problem
> to root out, but I'm guessing you did, and thus have the bad filesystem
> the patched btrfs check is designed to report, and that report is indeed
> valid.)

I have this issue as well on one of the filesystems I just checked in order to 
describe to John how to have a go at fixing his filesystem. A tone of these 
with different numbers:

bad extent [347045888, 347062272), type mismatch with chunk

It doesn´t go away with running btrfs check --repair on it.

Last scrub was from yesterday and returned with 0 errors. I will rerun a scrub 
again after the repair attempt. And if its good I will play it safe and redo 
the filesystem from scratch.

It may have been I used a mkfs.btrfs from 4.1.1 for creating it. Would be good 
if it stored the version of the tool that created the fs into the fs itself to 
be able to know for sure. It is the youngest BTRFS filesystem on my laptop 
SSDs. I created it about April 2014 tough.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs scrub failing

2016-01-01 Thread Martin Steigerwald

Am Freitag, 1. Januar 2016, 13:20:49 CET schrieb John Center:
> > On Jan 1, 2016, at 12:41 PM, Martin Steigerwald <mar...@lichtvoll.de>
> > wrote:
> > Am Freitag, 1. Januar 2016, 11:41:20 CET schrieb John Center:
[…]
> >>> On Jan 1, 2016, at 12:55 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> >>> 
> >>> A couple months ago, which would have made it around the 4.2 kernel
> >>> you're running (with 4.3 being current and 4.4 nearly out), there were a
> >>> number of similar scrub aborted reports on the list.
> >> 
> >> I must have missed that, I'll check the list again to try & understand
> >> the
> >> issue better.
> > 
> > I had repeatedly failing scrubs as mentioned in another thread here, until
> > I used 4.4 kernel. With 4.3 kernel scrub also didn´t work. I didn´t use
> > the debug options you used above and I am not sure whether I had this
> > scrub issue with 4.2 already, so I am not sure it has been the same
> > issue. But you may need to run 4.4 kernel in order to get scrub working
> > again.
> > 
> > See my thread "[4.3-rc4] scrubbing aborts before finishing" for details.
> 
> I was afraid of this. I just read your thread. I generally try to stay away
> from kernels so new, but I may have to try it. Was there any reason you
> didn't go to 4.1 instead?  (I run win8.1 in VirtualBox 5.0.12, when I need
> to run somethings under Windows. I'd have to wait until 4.4 is released &
> supported to do that.)

So far 4.4-rc6 is pretty stable for me. And I think its almost before release 
as rc7 is out already.

Reason for not going with 4.1? Ey, that would be downgrading, wouldn´t it? But 
sure it is also an option.

Virtualbox 5.0.12-dfsg-2 as packaged by Debian runs fine here with 4.4-rc6.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

still kworker at 100% cpu in all of device size allocated with chunks situations with write load

2016-01-01 Thread Martin Steigerwald

First: Happy New Year to you!

Second: Take your time. I know its holidays for many. For me it means I easily 
have time to follow-up on this.

Am Mittwoch, 16. Dezember 2015, 09:20:45 CET schrieb Qu Wenruo:
> Chris Mason wrote on 2015/12/15 16:59 -0500:
> > On Mon, Dec 14, 2015 at 10:08:16AM +0800, Qu Wenruo wrote:
> >> Martin Steigerwald wrote on 2015/12/13 23:35 +0100:
> >>> Hi!
> >>> 
> >>> For me it is still not production ready.
> >> 
> >> Yes, this is the *FACT* and not everyone has a good reason to deny it.
> >> 
> >>> Again I ran into:
> >>> 
> >>> btrfs kworker thread uses up 100% of a Sandybridge core for minutes on
> >>> random write into big file
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=90401
> >> 
> >> Not sure about guideline for other fs, but it will attract more dev's
> >> attention if it can be posted to maillist.
> >> 
> >>> No matter whether SLES 12 uses it as default for root, no matter whether
> >>> Fujitsu and Facebook use it: I will not let this onto any customer
> >>> machine
> >>> without lots and lots of underprovisioning and rigorous free space
> >>> monitoring. Actually I will renew my recommendations in my trainings to
> >>> be careful with BTRFS.
> >>> 
> >>>  From my experience the monitoring would check for:
> >>> merkaba:~> btrfs fi show /home
> >>> Label: 'home'  uuid: […]
> >>> 
> >>>  Total devices 2 FS bytes used 156.31GiB
> >>>  devid1 size 170.00GiB used 164.13GiB path
> >>>  /dev/mapper/msata-home
> >>>  devid2 size 170.00GiB used 164.13GiB path
> >>>  /dev/mapper/sata-home
> >>> 
> >>> If "used" is same as "size" then make big fat alarm. It is not
> >>> sufficient for it to happen. It can run for quite some time just fine
> >>> without any issues, but I never have seen a kworker thread using 100%
> >>> of one core for extended period of time blocking everything else on the
> >>> fs without this condition being met.>> 
> >> And specially advice on the device size from myself:
> >> Don't use devices over 100G but less than 500G.
> >> Over 100G will leads btrfs to use big chunks, where data chunks can be at
> >> most 10G and metadata to be 1G.
> >> 
> >> I have seen a lot of users with about 100~200G device, and hit unbalanced
> >> chunk allocation (10G data chunk easily takes the last available space
> >> and
> >> makes later metadata no where to store)
> > 
> > Maybe we should tune things so the size of the chunk is based on the
> > space remaining instead of the total space?
> 
> Submitted such patch before.
> David pointed out that such behavior will cause a lot of small
> fragmented chunks at last several GB.
> Which may make balance behavior not as predictable as before.
> 
> 
> At least, we can just change the current 10% chunk size limit to 5% to
> make such problem less easier to trigger.
> It's a simple and easy solution.
> 
> Another cause of the problem is, we understated the chunk size change
> for fs at the borderline of big chunk.
> 
> For 99G, its chunk size limit is 1G, and it needs 99 data chunks to
> fully cover the fs.
> But for 100G, it only needs 10 chunks to covert the fs.
> And it need to be 990G to match the number again.
> 
> The sudden drop of chunk number is the root cause.
> 
> So we'd better reconsider both the big chunk size limit and chunk size
> limit to find a balanaced solution for it.

Did you come to any conclusion here? Is there anything I can change with my 
home BTRFS filesystem to try to find out what works? Challenge here is that it 
doesn´t happen under defined circumstances. So far I only know the required 
condition, but not the sufficient condition for it to happen.

Another user run into the issue and reported his findings in the bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=90401#c14

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs scrub failing

2016-01-01 Thread Martin Steigerwald

Am Freitag, 1. Januar 2016, 11:41:20 CET schrieb John Center:

Happy New Year!

> > On Jan 1, 2016, at 12:55 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> > 
> >
> > John Center posted on Thu, 31 Dec 2015 11:20:28 -0500 as excerpted:
> > 
> >
> >> I run a weekly scrub, using Marc Merlin's btrfs-scrub script.
> >> Usually, it completes without a problem, but this week it failed.  I ran
> >>
> >> the scrub manually & it stops shortly:
> >> 
> >>
> >> john@mariposa:~$ sudo /sbin/btrfs scrub start -BdR /dev/md124p2
> >> ERROR: scrubbing /dev/md124p2 failed for device id 1:
> >> ret=-1, errno=5 (Input/output error)
> >> scrub device /dev/md124p2 (id 1) canceled
> >> scrub started at Thu Dec 31 00:26:34 2015
> >> and was aborted after 00:01:29 [...]
> >
> > 
> >
> >> My Ubuntu 14.04 workstation is using the 4.2 kernel (Wily).
> >> I'm using btrfs-tools v4.3.1. [...]
> >
> > 
> >
> > A couple months ago, which would have made it around the 4.2 kernel 
> > you're running (with 4.3 being current and 4.4 nearly out), there were a 
> > number of similar scrub aborted reports on the list.
> >
> > 
> 
> I must have missed that, I'll check the list again to try & understand the
> issue better. 

I had repeatedly failing scrubs as mentioned in another thread here, until I 
used 4.4 kernel. With 4.3 kernel scrub also didn´t work. I didn´t use the 
debug options you used above and I am not sure whether I had this scrub issue 
with 4.2 already, so I am not sure it has been the same issue. But you may 
need to run 4.4 kernel in order to get scrub working again.

See my thread "[4.3-rc4] scrubbing aborts before finishing" for details.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs und lvm-cache?

2015-12-23 Thread Martin Steigerwald

Am Mittwoch, 23. Dezember 2015, 11:45:28 CET schrieb Neuer User:
> Hello

Hi.

> I want to setup a small homeserver, based on a HP Microserver Gen8 (4GB
> RAM, 2x3TB HDD + 1x120GB SSD) and Proxmox as distro.
> 
> The server will be used to host a (small) number of virtual machines,
> most of them being LXC containers, few being KVM machines. One of the
> LXC containers will host a fileserver with app 1 TB of data and another
> one a backup system for the desktops / laptops in my household, thus
> probably holding quite a lot of files. The lxc containers will use the
> filesystem of the proxmox host, the KVM machines probably raw disk files
> (or qcow2).
> 
> I would like to combine high data integrity with some speed, so I
> thought of the following layout:
> 
> - both hdd and ssd in one LVM VG
> - one LV on each hdd, containing a btrfs filesystem
> - both btrfs LV configured as RAID1
> - the single SDD used as a LVM cache device for both HDD LVs to speed up
> random access, where possible
> 
> Now, I wonder if that is a good architecture to go for. Any input on
> that? Is btrfs the right way to go for, or should I better go for ZFS
> (and purchase some more gigs of RAM)?
> 
> Will there be any problems arising from the lvmcache? btrfs only sees
> the HDDs, LVM does the SDD handling.

As far as I understand this way you basically loose the RAID 1 semantics of 
BTRFS. While the data is redundant on the HDDs, it is not redundant on the 
SSD. It may work for a pure read cache, but for write-through you definately 
loose any data integrity protection a RAID 1 gives you.

Of course, you can use two SSDs and have them work as RAID 1 as well.

There is a patch set for in-BTRFS SSD-caching. It consists of a patch set to 
add hot data tracking to VFS and a patch set for adding support in BTRFS. But 
I didn´t see anything of these in quite some time.

Happy christmas,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.3-rc4] scrubbing aborts before finishing (SOLVED)

2015-12-17 Thread Martin Steigerwald

Am Mittwoch, 16. Dezember 2015, 00:18:53 CET schrieb Martin Steigerwald:
> Am Montag, 14. Dezember 2015, 08:59:59 CET schrieb Martin Steigerwald:
> > Am Mittwoch, 25. November 2015, 16:35:39 CET schrieben Sie:
> > > Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald:
> > > > Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin
> Steigerwald:
> > > > > I get this:
> > > > > 
> > > > > merkaba:~> btrfs scrub status -d /
> > > > > scrub status for […]
> > > > > scrub device /dev/mapper/sata-debian (id 1) history
> > > > > 
> > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted
> > > > > after
> > > > > 00:00:00
> > > > > total bytes scrubbed: 0.00B with 0 errors
> > > > > 
> > > > > scrub device /dev/dm-2 (id 2) history
> > > > > 
> > > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted
> > > > > after
> > > > > 00:01:30
> > > > > total bytes scrubbed: 23.81GiB with 0 errors
> > > > > 
> > > > > For / scrub aborts for sata SSD immediately.
> > > > > 
> > > > > For /home scrub aborts for both SSDs at some time.
[…]
> I now have 4.4-rc5 running, the boot crash I had appears to be fixed. Oh,
> and I see that scrubbing / at leasted worked now:
> 
> merkaba:~> btrfs scrub status -d /
> scrub status for […]
> scrub device /dev/dm-5 (id 1) history
> scrub started at Wed Dec 16 00:13:20 2015 and finished after
> 00:01:42 total bytes scrubbed: 23.94GiB with 0 errors
> scrub device /dev/mapper/msata-debian (id 2) history
> scrub started at Wed Dec 16 00:13:20 2015 and finished after
> 00:01:34 total bytes scrubbed: 23.94GiB with 0 errors
> 
> I will check with other BTRFS filesystems tomorrow and report back whether
> scrubbing is stable for meagain.

This appears to be fixed with 4.4-rc5. Thank you!!!

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Still not production ready

2015-12-15 Thread Martin Steigerwald

Am Dienstag, 15. Dezember 2015, 16:59:58 CET schrieb Chris Mason:
> On Mon, Dec 14, 2015 at 10:08:16AM +0800, Qu Wenruo wrote:
> > Martin Steigerwald wrote on 2015/12/13 23:35 +0100:
> > >Hi!
> > >
> > >For me it is still not production ready.
> > 
> > Yes, this is the *FACT* and not everyone has a good reason to deny it.
> > 
> > >Again I ran into:
> > >
> > >btrfs kworker thread uses up 100% of a Sandybridge core for minutes on
> > >random write into big file
> > >https://bugzilla.kernel.org/show_bug.cgi?id=90401
> > 
> > Not sure about guideline for other fs, but it will attract more dev's
> > attention if it can be posted to maillist.
> > 
> > >No matter whether SLES 12 uses it as default for root, no matter whether
> > >Fujitsu and Facebook use it: I will not let this onto any customer
> > >machine
> > >without lots and lots of underprovisioning and rigorous free space
> > >monitoring. Actually I will renew my recommendations in my trainings to
> > >be careful with BTRFS.
> > >
> > > From my experience the monitoring would check for:
> > >merkaba:~> btrfs fi show /home
> > >Label: 'home'  uuid: […]
> > >
> > > Total devices 2 FS bytes used 156.31GiB
> > > devid1 size 170.00GiB used 164.13GiB path
> > > /dev/mapper/msata-home
> > > devid2 size 170.00GiB used 164.13GiB path
> > > /dev/mapper/sata-home
> > >
> > >If "used" is same as "size" then make big fat alarm. It is not sufficient
> > >for it to happen. It can run for quite some time just fine without any
> > >issues, but I never have seen a kworker thread using 100% of one core
> > >for extended period of time blocking everything else on the fs without
> > >this condition being met.> 
> > And specially advice on the device size from myself:
> > Don't use devices over 100G but less than 500G.
> > Over 100G will leads btrfs to use big chunks, where data chunks can be at
> > most 10G and metadata to be 1G.
> > 
> > I have seen a lot of users with about 100~200G device, and hit unbalanced
> > chunk allocation (10G data chunk easily takes the last available space and
> > makes later metadata no where to store)
> 
> Maybe we should tune things so the size of the chunk is based on the
> space remaining instead of the total space?

Still on my filesystem where was over 1 GiB free on metadata chunks, so…

… my theory still is: BTRFS has trouble finding free space in chunks at some 
time.

> > And unfortunately, your fs is already in the dangerous zone.
> > (And you are using RAID1, which means it's the same as one 170G btrfs with
> > SINGLE data/meta)
> > 
> > >In addition to that last time I tried it aborts scrub any of my BTRFS
> > >filesstems. Reported in another thread here that got completely ignored
> > >so
> > >far. I think I could go back to 4.2 kernel to make this work.
> 
> We'll pick this thread up again, the ones that get fixed the fastest are
> the ones that we can easily reproduce.  The rest need a lot of think
> time.

I understand. Maybe I just wanted to see at least some sort of an reaction.

I now have 4.4-rc5 running, the boot crash I had appears to be fixed. Oh, and 
I see that scrubbing / at leasted worked now:

merkaba:~> btrfs scrub status -d /
scrub status for […]
scrub device /dev/dm-5 (id 1) history
scrub started at Wed Dec 16 00:13:20 2015 and finished after 00:01:42
total bytes scrubbed: 23.94GiB with 0 errors
scrub device /dev/mapper/msata-debian (id 2) history
scrub started at Wed Dec 16 00:13:20 2015 and finished after 00:01:34
total bytes scrubbed: 23.94GiB with 0 errors

Okay, I test the other ones tomorrow, so maybe this one is fixed meanwhile.

Yay!

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.3-rc4] scrubbing aborts before finishing (probably solved)

2015-12-15 Thread Martin Steigerwald

Am Montag, 14. Dezember 2015, 08:59:59 CET schrieb Martin Steigerwald:
> Am Mittwoch, 25. November 2015, 16:35:39 CET schrieben Sie:
> > Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald:
> > > Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin 
Steigerwald:
> > > > I get this:
> > > > 
> > > > merkaba:~> btrfs scrub status -d /
> > > > scrub status for […]
> > > > scrub device /dev/mapper/sata-debian (id 1) history
> > > > 
> > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted
> > > > after
> > > > 00:00:00
> > > > total bytes scrubbed: 0.00B with 0 errors
> > > > 
> > > > scrub device /dev/dm-2 (id 2) history
> > > > 
> > > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted
> > > > after
> > > > 00:01:30
> > > > total bytes scrubbed: 23.81GiB with 0 errors
> > > > 
> > > > For / scrub aborts for sata SSD immediately.
> > > > 
> > > > For /home scrub aborts for both SSDs at some time.
> > > > 
> > > > merkaba:~> btrfs scrub status -d /home
> > > > scrub status for […]
> > > > scrub device /dev/mapper/msata-home (id 1) history
> > > > 
> > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted
> > > > after
> > > > 00:01:31
> > > > total bytes scrubbed: 22.03GiB with 0 errors
> > > > 
> > > > scrub device /dev/dm-3 (id 2) history
> > > > 
> > > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted
> > > > after
> > > > 00:03:34
> > > > total bytes scrubbed: 53.30GiB with 0 errors
> > > > 
> > > > Also single volume BTRFS is affected:
> > > > 
> > > > merkaba:~> btrfs scrub status /daten
> > > > scrub status for […]
> > > > 
> > > > scrub started at Thu Oct 22 10:36:38 2015 and was aborted
> > > > after
> > > > 00:00:00
> > > > total bytes scrubbed: 0.00B with 0 errors
> > > > 
> > > > No errors in dmesg, btrfs device stat or smartctl -a.
> > > > 
> > > > Any known issue?
> > > 
> > > I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS
> > > doesn´t even start scrubbing. But in the end it aborts it scrubbing
> > > anyway.
> > > 
> > > I do not see any other issue so far. But I would really like to be able
> > > to
> > > scrub my BTRFS filesystems completely again. Any hints? Any further
> > > information needed?
> > > 
> > > merkaba:~> btrfs scrub status -d /
> > > scrub status for […]
> > > scrub device /dev/dm-5 (id 1) history
> > > 
> > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> > > total bytes scrubbed: 0.00B with 0 errors
> > > 
> > > scrub device /dev/mapper/msata-debian (id 2) status
> > > 
> > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20
> > > total bytes scrubbed: 5.27GiB with 0 errors
> > > 
> > > merkaba:~> btrfs scrub status -d /
> > > scrub status for […]
> > > scrub device /dev/dm-5 (id 1) history
> > > 
> > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> > > total bytes scrubbed: 0.00B with 0 errors
> > > 
> > > scrub device /dev/mapper/msata-debian (id 2) status
> > > 
> > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25
> > > total bytes scrubbed: 6.59GiB with 0 errors
> > > 
> > > merkaba:~> btrfs scrub status -d /
> > > scrub status for […]
> > > scrub device /dev/dm-5 (id 1) history
> > > 
> > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> > > total bytes scrubbed: 0.00B with 0 errors
> > > 
> > > scrub device /dev/mapper/msata-debian (id 2) status
> > > 
> > > scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25
> > > total bytes scrubbed: 21.97GiB with 0 errors
> > > 
> > > merkaba:~> btrfs scrub status -d /
> > > scrub status for […]
> > > scrub device /dev/dm-5 (id 1

safety of journal based fs (was: Re: still kworker at 100% cpu…)

2015-12-14 Thread Martin Steigerwald

Hi!

Using a different subject for the journal fs related things which are off 
topic, but still interesting. Might make sense to move to fsdevel-ml or ext4/
XFS mailing lists? Otherwise, I suggest we focus on BTRFS here. Still wanted 
to reply.

Am Montag, 14. Dezember 2015, 16:48:58 CET schrieb Qu Wenruo:
> Martin Steigerwald wrote on 2015/12/14 09:18 +0100:
> > Am Montag, 14. Dezember 2015, 10:08:16 CET schrieb Qu Wenruo:
> >> Martin Steigerwald wrote on 2015/12/13 23:35 +0100:
[…]
> >>> I am seriously consider to switch to XFS for my production laptop again.
> >>> Cause I never saw any of these free space issues with any of the XFS or
> >>> Ext4 filesystems I used in the last 10 years.
> >> 
> >> Yes, xfs and ext4 is very stable for normal use case.
> >> 
> >> But at least, I won't recommend xfs yet, and considering the nature or
> >> journal based fs, I'll recommend backup power supply in crash recovery
> >> for both of them.
> >> 
> >> Xfs already messed up several test environment of mine, and an
> >> unfortunate double power loss has destroyed my whole /home ext4
> >> partition years ago.
> > 
> > Wow. I have never seen this. Actual I teach journal filesystems being
> > quite
> > safe on power losses as long as cache flushes (former barrier)
> > functionality is active and working. With one caveat: It relies on one
> > sector being either completely written or not. I never seen any
> > scientific proof for that on usual storage devices.
> 
> The journal is used to be safe against power loss.
> That's OK.
> 
> But the problem is, when recovering journal, there is no journal of
> journal, to keep journal recovering safe from power loss.

But the journal should be safe due to a journal commit being one sector? Of 
course for the last changes without a journal commit its: The stuff is gone.

> And that's the advantage of COW file system, no need of journal completely.
> Although Btrfs is less safe than stable journal based fs yet.
> 
> >> [xfs story]
> >> After several crash, xfs makes several corrupted file just to 0 size.
> >> Including my kernel .git directory. Then I won't trust it any longer.
> >> No to mention that grub2 support for xfs v5 is not here yet.
> > 
> > That is no filesystem metadata structure crash. It is a known issue with
> > delayed allocation. Same with Ext4. I teach this as well in my performance
> > analysis & tuning course.
> 
> Unfortunately, it's not about delayed allocation, as it's not a new
> file, it's file already here with contents in previous transaction.
> The workload should only rewrite the files.(Not sure though)

For what I know the overwriting after truncating case is also related to the 
delayed allocation, deferred write thing: File has been truncated to zero 
bytes in journal, while no data has been written.

But well for Ext4 / XFS it doesn´t need to reallocate in this case.

> And for ext4 case, I'll see corrupted files, but not truncated to 0 size.
> So IMHO it may be related to xfs recovery behavior.
> But not sure as I never read xfs codes.

Journals online provide *metadata* consistency. Unless you use Ext4 with 
data=journal, which is supposed to be much slower, but in some workloads its 
actually faster. Even Andrew Morton had not explaination for that, however I 
do have an idea about it. Also data=journal is interesting, if you put journal 
for harddisk based Ext4 onto an SSD or an SSD RAID 1 or so.

> > Also BTRFS in principle has this issue I believe.  As far as I am aware it
> > has a fix for the rename case, not using delayed allocation in the case.
> > Due to its COW nature it may not be affected at all however, I don´t
> > know.
> Anyway for rewrite case, none of these fs should truncate fs size to 0.
> However, it seems xfs doesn't follow the way though.
> Although I'm not 100% sure, as after that disaster I reinstall my test
> box using ext4.
> 
> (Maybe next time I should try btrfs, at least when it fails, I have my
> chance to submit new patches to kernel or btrfsck)

I do think its the applications doing that on overwriting a file. Rewriting a 
config file for example. Its either write new file, rename to old, or truncate 
to zero bytes and rewrite.

Of course, its different for databases or other files written into without 
rewriting them. But there you need data=journal on Ext4. XFS doesn´t guarentee 
file consistency at all in that case, unless the application serializes 
changes with fsync() properly by using an in application journal for the data 
to write.

> >> [ext4 story]
> >> For ext4, when recovering my /home partition after a power loss, a new
> >> power loss hap

still kworker at 100% cpu in all of device size allocated with chunks situations with write load (was: Re: Still not production ready)

2015-12-14 Thread Martin Steigerwald

Am Sonntag, 13. Dezember 2015, 15:19:14 CET schrieb Marc MERLIN:
> On Sun, Dec 13, 2015 at 11:35:08PM +0100, Martin Steigerwald wrote:
> > Hi!
> > 
> > For me it is still not production ready. Again I ran into:
> > 
> > btrfs kworker thread uses up 100% of a Sandybridge core for minutes on
> > random write into big file
> > https://bugzilla.kernel.org/show_bug.cgi?id=90401
> 
> Sorry you're having issues. I haven't seen this before myself.
> I couldn't find the kernel version you're using in your Email or the bug
> you filed (quick scan).
> 
> That's kind of important :)

I definately know this much. :) It happened with 4.3 yesterday. The other 
kernel version was 3.18. Information should be in the bug report. Yeah, 3.18 
as mentioned in the Kernel Version field. And 4.3 as I mentioned in the last 
comment of the bug report.

The scrubbing issue is I think since 4.3, I also seen it with 4.4-rc2/rc4 I 
believe, but I didn´t go back then to check more toroughly. I didn´t report 
the scrubbing issue with bugzilla yet as I got no feedback on my mailing list 
posts so far. I will bump the thread in a moment and suggest we discuss free 
space issue here and scrubbing issue in the other thread. I went back to 4.3 
cause 4.4-rc2/4 does not even boot on my machine most of the times. I also 
reported this (BTRFS unrelated one).

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.3-rc4] scrubbing aborts before finishing

2015-12-14 Thread Martin Steigerwald

Am Mittwoch, 25. November 2015, 16:35:39 CET schrieben Sie:
> Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald:
> > Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin Steigerwald:
> > > I get this:
> > > 
> > > merkaba:~> btrfs scrub status -d /
> > > scrub status for […]
> > > scrub device /dev/mapper/sata-debian (id 1) history
> > > 
> > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after
> > > 00:00:00
> > > total bytes scrubbed: 0.00B with 0 errors
> > > 
> > > scrub device /dev/dm-2 (id 2) history
> > > 
> > > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after
> > > 00:01:30
> > > total bytes scrubbed: 23.81GiB with 0 errors
> > > 
> > > For / scrub aborts for sata SSD immediately.
> > > 
> > > For /home scrub aborts for both SSDs at some time.
> > > 
> > > merkaba:~> btrfs scrub status -d /home
> > > scrub status for […]
> > > scrub device /dev/mapper/msata-home (id 1) history
> > > 
> > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after
> > > 00:01:31
> > > total bytes scrubbed: 22.03GiB with 0 errors
> > > 
> > > scrub device /dev/dm-3 (id 2) history
> > > 
> > > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after
> > > 00:03:34
> > > total bytes scrubbed: 53.30GiB with 0 errors
> > > 
> > > Also single volume BTRFS is affected:
> > > 
> > > merkaba:~> btrfs scrub status /daten
> > > scrub status for […]
> > > 
> > > scrub started at Thu Oct 22 10:36:38 2015 and was aborted after
> > > 00:00:00
> > > total bytes scrubbed: 0.00B with 0 errors
> > > 
> > > No errors in dmesg, btrfs device stat or smartctl -a.
> > > 
> > > Any known issue?
> > 
> > I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS
> > doesn´t even start scrubbing. But in the end it aborts it scrubbing
> > anyway.
> > 
> > I do not see any other issue so far. But I would really like to be able to
> > scrub my BTRFS filesystems completely again. Any hints? Any further
> > information needed?
> > 
> > merkaba:~> btrfs scrub status -d /
> > scrub status for […]
> > scrub device /dev/dm-5 (id 1) history
> > 
> > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> > total bytes scrubbed: 0.00B with 0 errors
> > 
> > scrub device /dev/mapper/msata-debian (id 2) status
> > 
> > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20
> > total bytes scrubbed: 5.27GiB with 0 errors
> > 
> > merkaba:~> btrfs scrub status -d /
> > scrub status for […]
> > scrub device /dev/dm-5 (id 1) history
> > 
> > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> > total bytes scrubbed: 0.00B with 0 errors
> > 
> > scrub device /dev/mapper/msata-debian (id 2) status
> > 
> > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25
> > total bytes scrubbed: 6.59GiB with 0 errors
> > 
> > merkaba:~> btrfs scrub status -d /
> > scrub status for […]
> > scrub device /dev/dm-5 (id 1) history
> > 
> > scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> > total bytes scrubbed: 0.00B with 0 errors
> > 
> > scrub device /dev/mapper/msata-debian (id 2) status
> > 
> > scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25
> > total bytes scrubbed: 21.97GiB with 0 errors
> > 
> > merkaba:~> btrfs scrub status -d /
> > scrub status for […]
> > scrub device /dev/dm-5 (id 1) history
> > 
> > scrub started at Sat Oct 31 11:58:45 2015 and was aborted after
> > 
> > 00:00:00 total bytes scrubbed: 0.00B with 0 errors
> > scrub device /dev/mapper/msata-debian (id 2) history
> > 
> > scrub started at Sat Oct 31 11:58:45 2015 and was aborted after
> > 
> > 00:01:32 total bytes scrubbed: 23.63GiB with 0 errors
> > 
> > 
> > For the sake of it I am going to btrfs check one of the filesystem where
> > BTRFS aborts scrubbing (which is all of the laptop filesystems, not only
> > the RAID 1 one).
> > 
> > I will use the /daten filesystem as I can unmount it dur

still kworker at 100% cpu in all of device size allocated with chunks situations with write load (was: Re: Still not production ready)

2015-12-14 Thread Martin Steigerwald

Am Montag, 14. Dezember 2015, 10:08:16 CET schrieb Qu Wenruo:
> Martin Steigerwald wrote on 2015/12/13 23:35 +0100:
> > Hi!
> > 
> > For me it is still not production ready.
> 
> Yes, this is the *FACT* and not everyone has a good reason to deny it.
> 
> > Again I ran into:
> > 
> > btrfs kworker thread uses up 100% of a Sandybridge core for minutes on
> > random write into big file
> > https://bugzilla.kernel.org/show_bug.cgi?id=90401
> 
> Not sure about guideline for other fs, but it will attract more dev's
> attention if it can be posted to maillist.

I did, as mentioned in the bug report:

BTRFS free space handling still needs more work: Hangs again
Martin Steigerwald | 26 Dec 14:37 2014
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/41790

> > No matter whether SLES 12 uses it as default for root, no matter whether
> > Fujitsu and Facebook use it: I will not let this onto any customer machine
> > without lots and lots of underprovisioning and rigorous free space
> > monitoring. Actually I will renew my recommendations in my trainings to
> > be careful with BTRFS.
> > 
> >  From my experience the monitoring would check for:
> > merkaba:~> btrfs fi show /home
> > Label: 'home'  uuid: […]
> > 
> >  Total devices 2 FS bytes used 156.31GiB
> >  devid1 size 170.00GiB used 164.13GiB path
> >  /dev/mapper/msata-home
> >  devid2 size 170.00GiB used 164.13GiB path
> >  /dev/mapper/sata-home
> > 
> > If "used" is same as "size" then make big fat alarm. It is not sufficient
> > for it to happen. It can run for quite some time just fine without any
> > issues, but I never have seen a kworker thread using 100% of one core for
> > extended period of time blocking everything else on the fs without this
> > condition being met.
> And specially advice on the device size from myself:
> Don't use devices over 100G but less than 500G.
> Over 100G will leads btrfs to use big chunks, where data chunks can be
> at most 10G and metadata to be 1G.
> 
> I have seen a lot of users with about 100~200G device, and hit
> unbalanced chunk allocation (10G data chunk easily takes the last
> available space and makes later metadata no where to store)

Interesting, but in my case there is still quite some free space in already 
allocated metadata chunks. Anyway, I did had enospc issues on trying to 
balance the chunks.

> And unfortunately, your fs is already in the dangerous zone.
> (And you are using RAID1, which means it's the same as one 170G btrfs
> with SINGLE data/meta)

Well, I know for any FS its not recommended to let it run to full and leave 
about 10-15% free at least, but while it is not 10-15% anymore, its still a 
whopping 11-12 GiB of free space. I would accept a somewhat slower operation 
in this case, but no kworker at 100% for about 10-30 seconds blocking 
everything else on going on on the filesystem. For whatever reason Plasma 
seems to access the fs on almost every action I do with it, so not even panels 
slide out anymore or activity switcher works during that time.

> > In addition to that last time I tried it aborts scrub any of my BTRFS
> > filesstems. Reported in another thread here that got completely ignored so
> > far. I think I could go back to 4.2 kernel to make this work.
> 
> Unfortunately, this happens a lot of times, even you posted it to mail list.
> Devs here are always busy locating bugs or adding new features or
> enhancing current behavior.
> 
> So *PLEASE* be patient about such slow response.

Okay, thanks at least for the acknowledgement of this. I try to be even more 
patient.
 
> BTW, you may not want to revert to 4.2 until some bug fix is backported
> to 4.2.
> As qgroup rework in 4.2 has broken delayed ref and caused some scrub
> bugs. (My fault)

Hm, well scrubbing does not work for me either. But since 4.3/4.4rc2/4. I just 
bumped the thread:

Re: [4.3-rc4] scrubbing aborts before finishing

by replying a well by replying a third time to it (not fourth, miscounted:). 

> > I am not going to bother to go into more detail on any on this, as I get
> > the impression that my bug reports and feedback get ignored. So I spare
> > myself the time to do this work for now.
> > 
> > 
> > Only thing I wonder now whether this all could be cause my /home is
> > already
> > more than one and a half year old. Maybe newly created filesystems are
> > created in a way that prevents these issues? But it already has a nice
> > global reserve:
> > 
> > merkaba:~> btrfs fi df /
> > Data, RAID1: total=27.98GiB, used=24.07GiB
> > System, RAID1: total=19.00MiB, used=16.00KiB
&g

Re: still kworker at 100% cpu in all of device size allocated with chunks situations with write load

2015-12-14 Thread Martin Steigerwald

Hi Qu.

I reply to the journal fs things in a mail with a different subject.

Am Montag, 14. Dezember 2015, 16:48:58 CET schrieb Qu Wenruo:
> Martin Steigerwald wrote on 2015/12/14 09:18 +0100:
> > Am Montag, 14. Dezember 2015, 10:08:16 CET schrieb Qu Wenruo:
> >> Martin Steigerwald wrote on 2015/12/13 23:35 +0100:
[…]
> >> GlobalReserve is just a reserved space *INSIDE* metadata for some corner
> >> case. So its profile is always single.
> >> 
> >> The real problem is, how we represent it in btrfs-progs.
> >> 
> >> If it output like below, I think you won't complain about it more:
> >>   > merkaba:~> btrfs fi df /
> >>   > Data, RAID1: total=27.98GiB, used=24.07GiB
> >>   > System, RAID1: total=19.00MiB, used=16.00KiB
> >>   > Metadata, RAID1: total=2.00GiB, used=728.80MiB
> >> 
> >> Or
> >> 
> >>   > merkaba:~> btrfs fi df /
> >>   > Data, RAID1: total=27.98GiB, used=24.07GiB
> >>   > System, RAID1: total=19.00MiB, used=16.00KiB
> >>   > Metadata, RAID1: total=2.00GiB, used=(536.80 + 192.00)MiB
> >>   > 
> >>   >  \ GlobalReserve: total=192.00MiB, used=0.00B
> > 
> > Oh, the global reserve is *inside* the existing metadata chunks? Thats
> > interesting. I didn´t know that.
> 
> And I have already submit btrfs-progs patch to change the default output
> of 'fi df'.
> 
> Hopes to solve the problem.

Nice. Thank you. It clarifies it quite a bit. I always wondered why its 
single. On which device does it allocate it in a RAID 1? Also can the data 
stored in there temporarily be recreated in case of loosing a device? In case 
that not, BTRFS would not guarantee that one device of a RAID 1 can fail at 
all times.

Ciao,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Still not production ready

2015-12-13 Thread Martin Steigerwald

Hi!

For me it is still not production ready. Again I ran into:

btrfs kworker thread uses up 100% of a Sandybridge core for minutes on random 
write into big file
https://bugzilla.kernel.org/show_bug.cgi?id=90401


No matter whether SLES 12 uses it as default for root, no matter whether 
Fujitsu and Facebook use it: I will not let this onto any customer machine 
without lots and lots of underprovisioning and rigorous free space monitoring. 
Actually I will renew my recommendations in my trainings to be careful with 
BTRFS.

>From my experience the monitoring would check for:

merkaba:~> btrfs fi show /home
Label: 'home'  uuid: […]
Total devices 2 FS bytes used 156.31GiB
devid1 size 170.00GiB used 164.13GiB path /dev/mapper/msata-home
devid2 size 170.00GiB used 164.13GiB path /dev/mapper/sata-home

If "used" is same as "size" then make big fat alarm. It is not sufficient for 
it to happen. It can run for quite some time just fine without any issues, but 
I never have seen a kworker thread using 100% of one core for extended period 
of time blocking everything else on the fs without this condition being met.


In addition to that last time I tried it aborts scrub any of my BTRFS 
filesstems. Reported in another thread here that got completely ignored so 
far. I think I could go back to 4.2 kernel to make this work.


I am not going to bother to go into more detail on any on this, as I get the 
impression that my bug reports and feedback get ignored. So I spare myself the 
time to do this work for now.


Only thing I wonder now whether this all could be cause my /home is already 
more than one and a half year old. Maybe newly created filesystems are created 
in a way that prevents these issues? But it already has a nice global reserve:

merkaba:~> btrfs fi df /
Data, RAID1: total=27.98GiB, used=24.07GiB
System, RAID1: total=19.00MiB, used=16.00KiB
Metadata, RAID1: total=2.00GiB, used=536.80MiB
GlobalReserve, single: total=192.00MiB, used=0.00B


Actually when I see that this free space thing is still not fixed for good I 
wonder whether it is fixable at all. Is this an inherent issue of BTRFS or 
more generally COW filesystem design?

I think it got somewhat better. It took much longer to come into that state 
again than last time, but still, blocking like this is *no* option for a 
*production ready* filesystem.



I am seriously consider to switch to XFS for my production laptop again. Cause 
I never saw any of these free space issues with any of the XFS or Ext4 
filesystems I used in the last 10 years.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: shall distros run btrfsck on boot?

2015-11-25 Thread Martin Steigerwald

Am Mittwoch, 25. November 2015, 07:32:34 CET schrieb Austin S Hemmelgarn:
> On 2015-11-24 17:26, Eric Sandeen wrote:
> > On 11/24/15 2:38 PM, Austin S Hemmelgarn wrote:
> >> if the system was
> >> shut down cleanly, you're fine barring software bugs, but if it
> >> crashed, you should be running a check on the FS.
> > 
> > Um, no...
> > 
> > The *entire point* of having a journaling filesystem is that after a
> > crash or power loss, a journal replay on next mount will bring the
> > metadata into a consistent state.
> 
> OK, first, that was in reference to BTRFS, not ext4, and BTRFS is a COW
> filesystem, not a journaling one, which is an important distinction as
> mentioned by Hugo in his reply.  Second, there are two reasons that you
> should be running a check even of a journaled filesystem when the system
> crashes (this also applies to COW filesystems, and anything else that
> relies on atomicity of write operations for consistency):
> 
> 1. Disks don't atomically write anything bigger than a sector, and may
> not even atomically write the sector itself.  This means that it's
> possible to get a partial write to the journal, which in turn has
> significant potential to put the metadata in an inconsistent state when
> the journal gets replayed (IIRC, ext4 has a journal_checksum mount
> option that is supposed to mitigate this possibility).  This sounds like
> something that shouldn't happen all that often, but on a busy
> filesystem, the probability is exactly proportionate to the size of the
> journal relative to the size of the FS.
> 
> 2. If the system crashed, all code running on it immediately before the
> crash is instantly suspect, and you have no way to know for certain that
> something didn't cause random garbage to be written to the disk.  On top
> of this, hardware is potentially suspect, and when your hardware is
> misbehaving, then all bets as to consistency are immediately off.

In the case of shaky hardware a fsck run can report bogus data, i.e. problems 
where they are none or vice versa. If I suspect defect memory or controller I 
would check the device on different hardware only. Especially on attempts to 
repair any possible issues.


-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.3-rc4] scrubbing aborts before finishing

2015-11-25 Thread Martin Steigerwald

Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald:
> Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin Steigerwald:
> > I get this:
> > 
> > merkaba:~> btrfs scrub status -d /
> > scrub status for […]
> > scrub device /dev/mapper/sata-debian (id 1) history
> > 
> > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after
> > 00:00:00
> > total bytes scrubbed: 0.00B with 0 errors
> > 
> > scrub device /dev/dm-2 (id 2) history
> > 
> > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after
> > 00:01:30
> > total bytes scrubbed: 23.81GiB with 0 errors
> > 
> > For / scrub aborts for sata SSD immediately.
> > 
> > For /home scrub aborts for both SSDs at some time.
> > 
> > merkaba:~> btrfs scrub status -d /home
> > scrub status for […]
> > scrub device /dev/mapper/msata-home (id 1) history
> > 
> > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after
> > 00:01:31
> > total bytes scrubbed: 22.03GiB with 0 errors
> > 
> > scrub device /dev/dm-3 (id 2) history
> > 
> > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after
> > 00:03:34
> > total bytes scrubbed: 53.30GiB with 0 errors
> > 
> > Also single volume BTRFS is affected:
> > 
> > merkaba:~> btrfs scrub status /daten
> > scrub status for […]
> > 
> > scrub started at Thu Oct 22 10:36:38 2015 and was aborted after
> > 00:00:00
> > total bytes scrubbed: 0.00B with 0 errors
> > 
> > No errors in dmesg, btrfs device stat or smartctl -a.
> > 
> > Any known issue?
> 
> I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS
> doesn´t even start scrubbing. But in the end it aborts it scrubbing anyway.
> 
> I do not see any other issue so far. But I would really like to be able to
> scrub my BTRFS filesystems completely again. Any hints? Any further
> information needed?
> 
> merkaba:~> btrfs scrub status -d /
> scrub status for […]
> scrub device /dev/dm-5 (id 1) history
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
> scrub device /dev/mapper/msata-debian (id 2) status
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20
> total bytes scrubbed: 5.27GiB with 0 errors
> merkaba:~> btrfs scrub status -d /
> scrub status for […]
> scrub device /dev/dm-5 (id 1) history
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
> scrub device /dev/mapper/msata-debian (id 2) status
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25
> total bytes scrubbed: 6.59GiB with 0 errors
> merkaba:~> btrfs scrub status -d /
> scrub status for […]
> scrub device /dev/dm-5 (id 1) history
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
> scrub device /dev/mapper/msata-debian (id 2) status
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25
> total bytes scrubbed: 21.97GiB with 0 errors
> merkaba:~> btrfs scrub status -d /
> scrub status for […]
> scrub device /dev/dm-5 (id 1) history
> scrub started at Sat Oct 31 11:58:45 2015 and was aborted after
> 00:00:00 total bytes scrubbed: 0.00B with 0 errors
> scrub device /dev/mapper/msata-debian (id 2) history
> scrub started at Sat Oct 31 11:58:45 2015 and was aborted after
> 00:01:32 total bytes scrubbed: 23.63GiB with 0 errors
> 
> 
> For the sake of it I am going to btrfs check one of the filesystem where
> BTRFS aborts scrubbing (which is all of the laptop filesystems, not only
> the RAID 1 one).
> 
> I will use the /daten filesystem as I can unmount it during laptop runtime
> easily. There scrubbing aborts immediately:
> 
> merkaba:~> btrfs scrub start /daten
> scrub started on /daten, fsid […] (pid=13861)
> merkaba:~> btrfs scrub status /daten
> scrub status for […]
> scrub started at Sat Oct 31 12:04:25 2015 and was aborted after
> 00:00:00 total bytes scrubbed: 0.00B with 0 errors
> 
> It is single device:
> 
> merkaba:~> btrfs fi sh /daten
> Label: 'daten'  uuid: […]
> Total devices 1 FS bytes used 227.23GiB
> devid1 size 230.00GiB used 230.00GiB path
> /dev/mapper/msata-daten
> 
> btrfs-progs v4.2.2
> merkaba:~> btrfs fi df /daten
> Data, single: total=228.99GiB, used=226.7

Re: [RFC][PATCH 00/12] Enhanced file stat system call

2015-11-24 Thread Martin Steigerwald

Am Dienstag, 24. November 2015, 00:13:08 CET schrieb Christoph Hellwig:
> On Fri, Nov 20, 2015 at 05:19:31PM +0100, Martin Steigerwald wrote:
> > I know its mostly relevant for just for FAT32, but on any account rather
> > than trying to write 4 GiB and then file, it would be good to at some
> > time get a dialog at the beginning of the copy.
> 
> pathconf/fpathconf is supposed to handle that.  It's not super pretty
> but part of Posix.  Linus hates it, but it might be time to give it
> another try.

It might be interesting for BTRFS as well, to be able to ask what amount of 
free space there currently is *at* a given path. Cause with BTRFS and 
Subvolumes this may differ between different paths. Even tough its not 
implemented yet, it may be possible in the future to have one subvolume with 
RAID 1 profile and one with RAID 0 profile.

That said an application wanting to make sure it can write a certain amount of 
data can use fallocate. And thats thats the only reliable way to ensure it, I 
know of. Which can become tedious for several files, but there is no principal 
problem with preallocating all files if their sizes are known. Even rsync or 
desktop environments could work like that. First fallocate everything, then, 
only if that succeeds, start actually copying data. Disadvantage: On aborted 
copies you have all files with their correct sizes and no easy indicates on 
where the copy stopped.

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Unclear error message when running btrfs check on a mountpoint

2015-10-31 Thread Martin Steigerwald

Hi!

With kernel 4.3-rc7 and btrfs-progs 4.2.2 I get:

merkaba:~> btrfs check /daten
Superblock bytenr is larger than device size
Couldn't open file system


It took me a moment to see that I used a mountpoint and that this may be the
reason for the error message.

Maybe check for a device file as argument and give a clearer error message
in this case?

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: behavior of BTRFS in relation to inodes when moving/copying files between filesystems

2015-10-31 Thread Martin Steigerwald

Am Dienstag, 13. Oktober 2015, 12:39:12 CET schrieben Sie:
> Hi!
> 
> With BTRFS to XFS/Ext4 the inode number of the target file stays the same in 
> with both cp and mv case (/mnt/zeit is a freshly created XFS in this example):
> 
> merkaba:~> ls -li foo /mnt/zeit/moo
> 6609270  foo
>  99  /mnt/zeit/moo
> merkaba:~> cp foo /mnt/zeit/moo
> merkaba:~> ls -li foo /mnt/zeit/moo
> 6609270 8 foo
>  99  /mnt/zeit/moo
> merkaba:~> cp -p foo /mnt/zeit/moo  
> merkaba:~> ls -li foo /mnt/zeit/moo
> 6609270 foo
>  99 /mnt/zeit/moo
> merkaba:~> mv foo /mnt/zeit/moo
> merkaba:~> ls -lid /mnt/zeit/moo
> 99 -rw-r--r-- 1 root root 6 Okt 13 12:28 /mnt/zeit/moo
> 
> 
> With BTRFS as target filesystem however in the mv case I get a new inode:
> 
> merkaba:~> ls -li foo /home/moo
>  6609289 -rw-r--r-- 1 root root 6 Okt 13 12:34 foo
> 16476276 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo
> merkaba:~> cp foo /home/moo
> merkaba:~> ls -li foo /home/moo
>  6609289 -rw-r--r-- 1 root root 6 Okt 13 12:34 foo
> 16476276 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo
> merkaba:~> cp -p foo /home/moo 
> merkaba:~> ls -li foo /home/moo
>  6609289 -rw-r--r-- 1 root root 6 Okt 13 12:34 foo
> 16476276 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo
> merkaba:~> mv foo /home/moo
> merkaba:~> ls -li /home/moo 
> 16476280 -rw-r--r-- 1 root root 6 Okt 13 12:34 /home/moo
> 
> 
> Is this intentional and/or somehow related to the copy on write specifics of 
> the filesystem?
> 
> I think even with COW it can just overwrite the existing file instead of 
> removing the old one and creating a new one – but it wouldn´t give much of a 
> benefit unless the target file is nocow.
> 
> (Also I thought only certain other utilities had supercow powers, but well 
> BTRFS seems to have them as well :)

Anyone any idea?

Thanks,
-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.3-rc4] scrubbing aborts before finishing

2015-10-31 Thread Martin Steigerwald

Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin Steigerwald:
> I get this:
> 
> merkaba:~> btrfs scrub status -d /   
> scrub status for […]
> scrub device /dev/mapper/sata-debian (id 1) history
> scrub started at Thu Oct 22 10:05:49 2015 and was aborted after 
> 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
> scrub device /dev/dm-2 (id 2) history
> scrub started at Thu Oct 22 10:05:49 2015 and was aborted after 
> 00:01:30
> total bytes scrubbed: 23.81GiB with 0 errors
> 
> For / scrub aborts for sata SSD immediately.
> 
> For /home scrub aborts for both SSDs at some time.
> 
> merkaba:~> btrfs scrub status -d /home
> scrub status for […]
> scrub device /dev/mapper/msata-home (id 1) history
> scrub started at Thu Oct 22 10:09:37 2015 and was aborted after 
> 00:01:31
> total bytes scrubbed: 22.03GiB with 0 errors
> scrub device /dev/dm-3 (id 2) history
> scrub started at Thu Oct 22 10:09:37 2015 and was aborted after 
> 00:03:34
> total bytes scrubbed: 53.30GiB with 0 errors
> 
> Also single volume BTRFS is affected:
> 
> merkaba:~> btrfs scrub status /daten
> scrub status for […]
> scrub started at Thu Oct 22 10:36:38 2015 and was aborted after 
> 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
> 
> 
> No errors in dmesg, btrfs device stat or smartctl -a.
> 
> Any known issue?

I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS
doesn´t even start scrubbing. But in the end it aborts it scrubbing anyway.

I do not see any other issue so far. But I would really like to be able to
scrub my BTRFS filesystems completely again. Any hints? Any further
information needed? 

merkaba:~> btrfs scrub status -d /
scrub status for […]
scrub device /dev/dm-5 (id 1) history
scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
total bytes scrubbed: 0.00B with 0 errors
scrub device /dev/mapper/msata-debian (id 2) status
scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20
total bytes scrubbed: 5.27GiB with 0 errors
merkaba:~> btrfs scrub status -d /
scrub status for […]
scrub device /dev/dm-5 (id 1) history
scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
total bytes scrubbed: 0.00B with 0 errors
scrub device /dev/mapper/msata-debian (id 2) status
scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25
total bytes scrubbed: 6.59GiB with 0 errors
merkaba:~> btrfs scrub status -d /
scrub status for […]
scrub device /dev/dm-5 (id 1) history
scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
total bytes scrubbed: 0.00B with 0 errors
scrub device /dev/mapper/msata-debian (id 2) status
scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25
total bytes scrubbed: 21.97GiB with 0 errors
merkaba:~> btrfs scrub status -d /
scrub status for […]
scrub device /dev/dm-5 (id 1) history
scrub started at Sat Oct 31 11:58:45 2015 and was aborted after 00:00:00
total bytes scrubbed: 0.00B with 0 errors
scrub device /dev/mapper/msata-debian (id 2) history
scrub started at Sat Oct 31 11:58:45 2015 and was aborted after 00:01:32
total bytes scrubbed: 23.63GiB with 0 errors


For the sake of it I am going to btrfs check one of the filesystem where
BTRFS aborts scrubbing (which is all of the laptop filesystems, not only
the RAID 1 one).

I will use the /daten filesystem as I can unmount it during laptop runtime
easily. There scrubbing aborts immediately:

merkaba:~> btrfs scrub start /daten 
scrub started on /daten, fsid […] (pid=13861)
merkaba:~> btrfs scrub status /daten
scrub status for […]
scrub started at Sat Oct 31 12:04:25 2015 and was aborted after 00:00:00
total bytes scrubbed: 0.00B with 0 errors

It is single device:

merkaba:~> btrfs fi sh /daten
Label: 'daten'  uuid: […]
Total devices 1 FS bytes used 227.23GiB
devid1 size 230.00GiB used 230.00GiB path /dev/mapper/msata-daten

btrfs-progs v4.2.2
merkaba:~> btrfs fi df /daten
Data, single: total=228.99GiB, used=226.79GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=1.01GiB, used=449.50MiB
GlobalReserve, single: total=160.00MiB, used=0.00B


I do not see any output in btrfs check that points to any issue:

merkaba:~> btrfs check /dev/msata/daten
Checking filesystem on /dev/msata/daten
UUID: 7918274f-e2ec-4983-bbb0-aa93ef95fcf7
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 243936530607 bytes used err is 0
total csum bytes: 237758932
total tree bytes: 471384064
total fs tree bytes: 116473856
total extent tree bytes: 78544896
btree space waste bytes: 57523323
file data blocks allocated: 422700576768
 refer

1 2 3 4 5 >

1 - 100 of 489 matches

Mail list logo