mer: all the above statements in relation to conception and
understanding of quotas, not to be confused with qgroups.
--
Tomasz Pala
only expectation (except for worldwide peace and other
unrealistic ones) would be to stop using "quotas", "subvolume quotas"
and "qgroups" interchangeably in btrfs context, as IMvHO these are not
plain, well-known "quotas".
--
Tomasz Pala
ommand btrfs qgroup(8)"
- they are the same... just completely different from traditional "quotas".
My suggestion would be to completely remove the standalone "quota" word
from btrfs documentation - there is no "quota", just "subvolume quota"
or "qgroup" supported.
--
Tomasz Pala
reflecting anything valuable, unless the problems with extent
fragmentation are already resolved somehow?
So IMHO current quotas are:
- not discoverable for user (shared->exclusive transition of my data by
someone's else action),
- not reliable for sysadm (offensive write pattern by any user can allocate
virtually any space despite of quotas).
--
Tomasz Pala
unt half of the
data, and twice the data in an opposite scenario (like "dup" profile on
single-drive filesystem).
In short: values representing quotas are user-oriented ("the numbers one
bought"), not storage-oriented ("the numbers they actually occupy").
--
Toma
with current approach it should be possible to interlace
defragmentation with some kind of naive-deduplication; "naive" in the
approach of comparing blocks only within the same in-subvolume paths.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line
On Sun, Feb 18, 2018 at 10:28:02 +0100, Tomasz Pala wrote:
> I've already noticed this problem on February 10th:
> [btrfs-progs] coreutils-like -i parameter, splitting permissions for various
> tasks
>
> In short: not possible. Regular user can only create subvolumes.
Not poss
> After few years not using btrfs (because previously was quite
> unstable) It is really good to see that now I'm not able to crash it.
It's not crashing with LTS 4.4 and 4.9 kernels, many reports of various
crashes in 4.12, 4.14 and 4.15 were posted here. It is really hard to say,
which of the p
patibility, these tools
could be issued by 'btrfs' wrapper binary.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
lanned:
http://0pointer.net/blog/projects/stateless.html
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
agree with someone who refuses to do _anything_.
You can choose to follow whatever, MD, LVM, ZFS, invent something
totally different, write custom daemon or put timeout logic inside the
kernel itself. It doesn't matter. You know the ecosystem - it is the
udev that must be signalled somehow and sy
ootflags at all:
grep -iE 'rootflags|degraded|btrfs' openrc/**/*
it won't support this without some extra code.
> The thing is, it primarily breaks if there are hardware issues,
> regardless of the init system being used, but at least the other init
> systems _give you an error message_ (even if it's really the kernel
> spitting it out) instead of just hanging there forever with no
> indication of what's going on like systemd does.
If your systemd waits forever and you have no error messages, report bug
to your distro maintainer, as he is probably the one to blame for fixing
what ain't broken.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 30, 2018 at 16:09:50 +0100, Tomasz Pala wrote:
>> BCP for over a
>> decade has been to put multipathing at the bottom, then crypto, then
>> software RAID, than LVM, and then whatever filesystem you're using.
>
> Really? Let's enumerate some caveats of t
inux-wide consensus. And if anyone would succeed,
there would be some Austins blaming them for 'overtaking good old
trashyard into coherent de facto standard.'
> In this particular case, you don't need a daemon because the kernel does
> the state tracking.
Sure, MD doesn't require daemon and LVM does
cially because it requires the
current udev rule to be slightly changed.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
out calling a mount helper, most notably the kernel when
> it mounts the root filesystem on boot if you're not using an initramfs).
> All in all, this type of thing gets out of hand _very_ fast.
You need to think about the two separately:
1. tracking STATE - this is remembering 'allow-d
e already described a few today, pointed the source and gave some
possible alternate solutions.
> which is why no other init system does it, and in fact no
Other init systems either fail at mounting degraded btrfs just like
systemd does, or have buggy workarounds in their code reimplemented in
each other just to handle thing, that should be centrally organized.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
, so the
umount SHOULD happen, or we are facing some MALFUNCION, which is fatal
itself, not by being a "race condition".
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord
irresponsible
to hardcode any mount-degraded rule inside systemd itself.
That is exactly why this must go through the udev - udev is responsible
for handling devices in Linux world. How can I register btrfs device
in udev, since it's overlapping the block device? I can't - the ioctl
is one-way,
RYING_DEGRADED (when
instructed to do so after expired timeout), systemd could handle
additional per-filesystem fstab options, like x-systemd.allow-degraded.
Then in would be possible to have best-effort policy for rootfs (to make
machine boot), and more strict one for crucial data (do not mount it
w
but overall _availability_.
I do not care if there are 2, 5 or 100 devices. I do care if there is
ENOUGH devices to run regular (including N-way mirroring and hot spares)
and if not - if there is ENOUGH devices to run degraded. Having ALL the
devices is just the edge case.
--
Tomasz Pala &
On Sun, Jan 28, 2018 at 01:00:16 +0100, Tomasz Pala wrote:
> It can't mount degraded, because the "missing" device might go online a
> few seconds ago.
s/ago/after/
>> The central problem is the lack of a timer and time out.
>
> You got mdadm-last-resort@.timer/ser
s 'not available',
don't expect it to be kept used. Just fix the code to match reality.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
l has already mounted it" and ignore kernel screaming
"the device is (not yet there/gone)"?
Just update the internal state after successful mount and this
particular problem is gone. Unless there is some race condition and the
state should be changed before the mount is announced
Y returns "READY" (or new value "DEGRADED") -> udev
catches event and changes SYSTEMD_READY -> systemd mounts the volume.
This is really simple. All you need to do is to pass "degraded" to the
btrfs.ko, so the BTRFS_IOC_DEVICES_READY would return "go a
ssembled *OR* times out and kernel get's
instructed to run array as degraded, which effects in /dev/mdX appearing.
There is NO additional logic in systemd.
This is NOT systemd that assembles degraded mdadm, this is mdadm that
tells the kernel to assemble it and systemd mounts READY md.
Moreover, systemd giv
l devices are present. So, mount will succeed,
> right?
Systemd doesn't count anything, it asks BTRFS_IOC_DEVICES_READY as
implemented in btrfs/super.c.
> Ie, the thing systemd can safely do, is to stop trying to rule everything,
> and refrain from telling the user whether he can moun
eady to be mounted, but not fully populated" (i.e.
"degraded mount possible"). Then systemd could _fallback_ after timing
out to degraded mount automatically according to some systemd-level
option.
Unless there is *some* signalling from btrfs, there is really not much
systemd can *
implementation issues/quirks, _not_ related to possible
hardware malfunctions.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
d is required for rootfs).
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
is would render any NAS/FC/iSCSI-backed or more complicated systems
unusable or hide problems in case of temporary problems with connection.
systemd waits for the _underlying_ device - unless btrfs exposes them as
a list of _actual_ devices to wait for, there is nothing except for
waiting for btrfs itse
Errata:
On Wed, Dec 20, 2017 at 09:34:48 +0100, Tomasz Pala wrote:
> /dev/sda -> 'not ready'
> /dev/sdb -> 'not ready'
> /dev/sdc -> 'ready', triggers /dev/sda -> 'not ready' and /dev/sdb - still
> 'not ready'
> /dev/sdc -> kernel says 'ready', triggers /
them as 'ready'" so the udev could
fire it's rules. And if there would be anything for udev to distinguish
'ready' from 'ready-degraded' one could easily compose some notification
scripting on top of it, including sending e-mail to sysadmin.
Is there anything that would make the kernel
er or anything else to make the *R*aid work.
> There's a mount option for it per-filesystem. Just add that to all your
> mount calls, and you get exactly the same effect.
If only they were passed...
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the
t would require
> constant curation to keep up to date. Only for long-term known issues
OK, you've convinced me that kernel-vs-feature list is overhead.
So maybe other approach: just like systemd sets the system time (when no
time source available) to it's own release date, ma
one if current kernel handles degraded RAID1
without switching to r/o, doesn't it? Or something else is missing?
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
default, might be kernel compile-time knob, module
parameter or anything else to make the *R*aid work.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
nt to worry you, but properly managed RAIDs make I/J-of-K
trivial-failures transparent. Just like ECC protects N/M bits transparently.
Investigating the reasons is sysadmin's job, just like other
maintenance, including restoring protection level.
--
Tomasz Pala <go...@pld-linux.org>
--
To
y other corner cases or usage
scenarios. In a fact, not only the internals, but motivation and design
principles must be well understood to write piece of documentation.
Otherwise some "fake news" propaganda is being created, just like
https://suckless.org/sucks/systemd or other syste
if I had a RAID1.
4. As already said before, using r/w degraded RAID1 is FULLY ACCEPTABLE,
as long as you accept "no more redundancy"...
4a. ...or had an N-way mirror and there is still some redundancy if N>2.
Since we agree, that btrfs RAID != common RAID, as there are/were
diffe
r degraded mount, not nice... Not
_expected_ to happen after single disk failure (without any reappearing).
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
s).
I'd say, that from security point of view the nocow should be default,
unless specified for mount or specific file... Currently, if I mount
with nocow, there is no way to whitelist trusted users or secure
location, and until btrfs-specific options could be handled per
subvolume, there is rea
tuation have with snapshots anyway?
Oh, and BTW - 900+ extents for ~5 GB taken means there is about 5.5 MB
occupied per extent. How is that possible?
--
Tomasz Pala <go...@pld-linux.org>
File log.14 has 933 extents:
# Logical Physical Length Fl
On Sun, Dec 10, 2017 at 12:27:38 +0100, Tomasz Pala wrote:
> I have found a directory - pam_abl databases, which occupy 10 MB (yes,
> TEN MEGAbytes) and released ...8.7 GB (almost NINE GIGAbytes) after
# df
Filesystem Size Used Avail Use% Mounted on
/dev/sda264G
files due to the nature of data loss (beginning of blocks).
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
greater than space lost on inter-snapshot duplication.
I can't just defrag entire filesystem since it breaks links with snapshots.
This change was a real deal-breaker here...
Any way to fed the deduplication code with snapshots maybe? There are
directories and files in the same layou
ly, as the needs are conflicting, but their
impact might be nullified by some housekeeping.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at
On Sat, Dec 02, 2017 at 17:28:12 +0100, Tomasz Pala wrote:
>> Suppose you start with a 100 MiB file (I'm adjusting the sizes down from
> [...]
>> Now make various small changes to the file, say under 16 KiB each. These
>> will each be COWed elsewhere as one might expe
kily few people have this sort of usage pattern, but if you do...
>
> It would certainly explain the space eating...
Did anyone investigated how is that related to RRD rewrites? I don't use
rrdcached, never thought that 100 MB of data might trash entire
filesystem...
best regards,
--
Tomas
ationale behind this is obvious: since the snapshot-aware defrag was
removed, allow to defrag snapshot exclusive data only.
This would of course result in partial file defragmentation, but that
should be enough for pathological cases like mine.
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe
user 0.00s system 0% cpu 30.798 total
> And further more, please ensure that all deleted files are really deleted.
> Btrfs delay file and subvolume deletion, so you may need to sync several
> times or use "btrfs subv sync" to ensure deleted files are deleted.
Yes, I was aware
that was taken after some minor (<100 MB) changes
from the subvolume, that has undergo some minor changes since then,
occupied 8 GB during one night when the entire system was idling.
This was crosschecked on files metadata (mtimes compared) and 'du'
results.
As a last-resort I've reba
s extremelly
low. Actually most of the difs between subvolumes come from updating
distro packages. There were not much reflink copies made on this
partition, only one kernel source compiled (.ccache files removed
today). So this partition is as clean, as it could be after almost
5 months in use.
Act
: 55.97GiB
Metadata,RAID1: 2.00GiB
System,RAID1: 32.00MiB
Unallocated: 4.93GiB
/dev/sdb2, ID: 2
Device size:64.00GiB
Device slack: 0.00B
Data,single: 132.00MiB
Data,RAID1: 55.97GiB
Meta
her snapshots, much more exclusive data
shown in qgroup than actually found in files. So if not files, where
is that space wasted? Metadata?
btrfs-progs-4.12 running on Linux 4.9.46.
best regards,
--
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscri
55 matches
Mail list logo