Hi Darrick,
Thanks for commenting..
+ memcpy(>s_uuid, fs_info->fsid, BTRFS_FSID_SIZE);
uuid_copy()?
It requires a larger migration to use uuid_t, IMO it can be done all
together, in a separate patch ?
Just for experiment, starting with struct btrfs_fs_info.fsid and
to
From: Su Yue
In replay_xattr_deletes(), the argument @slot of verify_dir_item()
should be variable @i instead of path->slots[0].
The bug causes failure of generic/066 and shared/002 in xfstest.
dmesg:
[12507.810781] BTRFS critical (device dm-0): invalid dir item name
Austin S. Hemmelgarn posted on Tue, 01 Aug 2017 10:47:30 -0400 as
excerpted:
> I think I _might_ understand what's going on here. Is that test program
> calling fallocate using the desired total size of the file, or just
> trying to allocate the range beyond the end to extend the file? I've
>
Roman Mamedov posted on Tue, 01 Aug 2017 11:08:05 +0500 as excerpted:
> On Sun, 30 Jul 2017 18:14:35 +0200 "marcel.cochem"
> wrote:
>
>> I am pretty sure that not all data is lost as i can grep thorugh the
>> 100 GB SSD partition. But my question is, if there is a
[ ... ]
> This is the "storage for beginners" version, what happens in
> practice however depends a lot on specific workload profile
> (typical read/write size and latencies and rates), caching and
> queueing algorithms in both Linux and the HA firmware.
To add a bit of slightly more advanced
On Tue, Aug 01, 2017 at 11:04:10AM +0500, Roman Mamedov wrote:
> On Mon, 31 Jul 2017 11:12:01 -0700
> Liu Bo wrote:
>
> > Superblock and chunk tree root is OK, looks like the header part of
> > the tree root is now all-zero, but I'm unable to think of a btrfs bug
> > which
On 2017-08-01 23:00, Christoph Anton Mitterer wrote:
> Hi.
>
> Stupid question:
> Would the write hole be closed already, if parity was checksummed?
No.
The write hole problem is due to a combination of two things:
a) misalignment between parity and data (i.e. unclean shutdown)
b) loosing of a
On 2017-08-01 19:24, Liu Bo wrote:
> On Tue, Aug 01, 2017 at 07:42:14PM +0200, Goffredo Baroncelli wrote:
>> Hi Liu,
>>
>> On 2017-08-01 18:14, Liu Bo wrote:
>>> This aims to fix write hole issue on btrfs raid5/6 setup by adding a
>>> separate disk as a journal (aka raid5/6 log), so that after
Hi.
Stupid question:
Would the write hole be closed already, if parity was checksummed?
Cheers,
Chris.
smime.p7s
Description: S/MIME cryptographic signature
2017-08-01 23:21 GMT+03:00 Leonidas Spyropoulos :
> On 01/08/17, E V wrote:
>> In general I think btrfs takes time proportional to the size of your
>> metadata to mount. Bigger and/or fragmented metadata leads to longer
>> mount times. My big backup fs with >300GB of metadata
Then following problem is directly related with that:
https://unix.stackexchange.com/questions/377914/how-to-test-if-two-btrfs-snapshots-are-identical
Is that a bug or a feature?
2017-08-01 23:33 GMT+03:00 A L :
>
> On 8/1/2017 10:24 PM, Cerem Cem ASLAN wrote:
>>
>>
On 8/1/2017 10:24 PM, Cerem Cem ASLAN wrote:
What is that mean? Can't we replicate the same snapshot with `btrfs
send | btrfs receive` multiple times, because it will have a "Received
UUID" at the first `btrfs receive
You will need to make a new read-write snapshot of the received volume
to
Hi/2 all...
I've been using btrfs for years without any major issues (ok, not true
but it was always my own fault).
This time around i was testing the new BFQ in 4.12 (hint, dont use it
for heavy i/o), and the laptop froze solid. So far so good, the thing
is, once i rebooted i had a corrupted
What is that mean? Can't we replicate the same snapshot with `btrfs
send | btrfs receive` multiple times, because it will have a "Received
UUID" at the first `btrfs receive`?
2017-08-01 15:54 GMT+03:00 A L :
> OK. The problem was that the original subvolume had a
On 01/08/17, E V wrote:
> In general I think btrfs takes time proportional to the size of your
> metadata to mount. Bigger and/or fragmented metadata leads to longer
> mount times. My big backup fs with >300GB of metadata takes over
> 20minutes to mount, and that's with the space tree which is
>
>> [ ... ] a "RAID5 with 128KiB writes and a 768KiB stripe
>> size". [ ... ] several back-to-back 128KiB writes [ ... ] get
>> merged by the 3ware firmware only if it has a persistent
>> cache, and maybe your 3ware does not have one,
> KOS: No I don't have persistent cache. Only the 512 Mb cache
From: Josef Bacik
Our dir_context->pos is supposed to hold the next position we're
supposed to look. If we successfully insert a delayed dir index we
could end up with a duplicate entry because we don't increase ctx->pos
after doing the dir_emit.
Signed-off-by: Josef Bacik
From: Josef Bacik
Readdir does dir_emit while under the btree lock. dir_emit can trigger
the page fault which means we can deadlock. Fix this by allocating a
buffer on opening a directory and copying the readdir into this buffer
and doing dir_emit from outside of the tree lock.
On 2017-08-01 15:07, Holger Hoffstätte wrote:
On 08/01/17 20:15, Holger Hoffstätte wrote:
On 08/01/17 19:34, Austin S. Hemmelgarn wrote:
[..]
Apparently, if you call fallocate() on a file with an offset of 0 and
a length longer than the length of the file itself, BTRFS will
allocate that exact
On 08/01/17 20:15, Holger Hoffstätte wrote:
> On 08/01/17 19:34, Austin S. Hemmelgarn wrote:
> [..]
>> Apparently, if you call fallocate() on a file with an offset of 0 and
>> a length longer than the length of the file itself, BTRFS will
>> allocate that exact amount of space, instead of just
On Tue, Aug 1, 2017 at 12:36 PM, Alan Brand wrote:
> I successfully repaired the superblock, copied it from one of the backups.
> My biggest problem now is that the UUID for the disk has changed due
> to the reformatting and no longer matches what is in the metadata.
> I
你有美元,教你躺着赚钱:
您的朋友在科勒(中国)官网上找到一个认为您可能感兴趣的内容,并分享给您:
链接地址:http://www.kohler.com.cn/product/K-1809T-0/K-1809T-0/
他对您留言:高价收美元,先结人民币给你,再结美金,让您无后顾之忧,详情请加黄生:扣 扣 7825 53723,手
机:135-3541-5522,V信同号,欢迎前来咨询
On Tue, Aug 01, 2017 at 07:42:14PM +0200, Goffredo Baroncelli wrote:
> Hi Liu,
>
> On 2017-08-01 18:14, Liu Bo wrote:
> > This aims to fix write hole issue on btrfs raid5/6 setup by adding a
> > separate disk as a journal (aka raid5/6 log), so that after unclean
> > shutdown we can make sure data
On Thu, Jul 27, 2017 at 8:49 AM, Alan Brand wrote:
> I know I am screwed but hope someone here can point at a possible solution.
>
> I had a pair of btrfs drives in a raid0 configuration. One of the
> drives was pulled by mistake, put in a windows box, and a quick NTFS
>
On Tue, Aug 01, 2017 at 10:56:39AM -0600, Liu Bo wrote:
> On Tue, Aug 01, 2017 at 05:28:57PM +, Hugo Mills wrote:
> >Hi,
> >
> >Great to see something addressing the write hole at last.
> >
> > On Tue, Aug 01, 2017 at 10:14:23AM -0600, Liu Bo wrote:
> > > This aims to fix write hole
On 08/01/17 19:34, Austin S. Hemmelgarn wrote:
[..]
> Apparently, if you call fallocate() on a file with an offset of 0 and
> a length longer than the length of the file itself, BTRFS will
> allocate that exact amount of space, instead of just filling in holes
> in the file and allocating space to
- Original Message -
From: "Peter Grandi"
To: "Linux fs Btrfs"
Sent: Tuesday, 1 August, 2017 3:14:07 PM
Subject: Re: Btrfs + compression = slow performance and high cpu usage
> Peter, I don't think the filefrag is showing the
On Tue, Aug 01, 2017 at 01:39:59PM -0400, Austin S. Hemmelgarn wrote:
> On 2017-08-01 13:25, Roman Mamedov wrote:
> > On Tue, 1 Aug 2017 10:14:23 -0600
> > Liu Bo wrote:
> >
> > > This aims to fix write hole issue on btrfs raid5/6 setup by adding a
> > > separate disk as a
On Thu, Jul 27, 2017 at 9:33 AM, Ivan Sizov wrote:
> I've just noticed a huge number of errors on one of the RAID's disks.
> "btrfs dev stats" gives:
>
> [/dev/sdc1].write_io_errs0
> [/dev/sdc1].read_io_errs 305
> [/dev/sdc1].flush_io_errs0
>
On Tue, Aug 01, 2017 at 10:25:47PM +0500, Roman Mamedov wrote:
> On Tue, 1 Aug 2017 10:14:23 -0600
> Liu Bo wrote:
>
> > This aims to fix write hole issue on btrfs raid5/6 setup by adding a
> > separate disk as a journal (aka raid5/6 log), so that after unclean
> >
On Tue, Aug 01, 2017 at 05:28:57PM +, Hugo Mills wrote:
>Hi,
>
>Great to see something addressing the write hole at last.
>
> On Tue, Aug 01, 2017 at 10:14:23AM -0600, Liu Bo wrote:
> > This aims to fix write hole issue on btrfs raid5/6 setup by adding a
> > separate disk as a
Hi Liu,
On 2017-08-01 18:14, Liu Bo wrote:
> This aims to fix write hole issue on btrfs raid5/6 setup by adding a
> separate disk as a journal (aka raid5/6 log), so that after unclean
> shutdown we can make sure data and parity are consistent on the raid
> array by replaying the journal.
>
it
On 2017-08-01 13:25, Roman Mamedov wrote:
On Tue, 1 Aug 2017 10:14:23 -0600
Liu Bo wrote:
This aims to fix write hole issue on btrfs raid5/6 setup by adding a
separate disk as a journal (aka raid5/6 log), so that after unclean
shutdown we can make sure data and parity
A recent thread on the BTRFS mailing list [1] brought up some odd
behavior in BTRFS that I've long suspected but not had prior reason to
test. I've put the fsdevel mailing list on CC since I'm curious to hear
what people there think about this.
Apparently, if you call fallocate() on a file
Hi,
Great to see something addressing the write hole at last.
On Tue, Aug 01, 2017 at 10:14:23AM -0600, Liu Bo wrote:
> This aims to fix write hole issue on btrfs raid5/6 setup by adding a
> separate disk as a journal (aka raid5/6 log), so that after unclean
> shutdown we can make sure
On Tue, 1 Aug 2017 10:14:23 -0600
Liu Bo wrote:
> This aims to fix write hole issue on btrfs raid5/6 setup by adding a
> separate disk as a journal (aka raid5/6 log), so that after unclean
> shutdown we can make sure data and parity are consistent on the raid
> array by
Currently there is a memory leak if we have an error while adding a
raid5/6 log. Moreover, it didn't abort the transaction as others do,
so this is fixing the broken error handling by applying two steps on
initializing the log, step #1 is to allocate memory, check if it has a
proper size, and
This updates recovery code to use the readahead helper.
Signed-off-by: Liu Bo
---
fs/btrfs/raid56.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 24f7cbb..8f47e56 100644
--- a/fs/btrfs/raid56.c
+++
The log space is limited, so reclaim is necessary when there is not enough
space to use.
By recording the largest position we've written to the log disk and
flushing all disks' cache and the superblock, we can be sure that data
and parity before this position have the identical copy in the log
This introduces add_dev_v2 ioctl to add a device as raid56 journal
device. With the help of a journal device, raid56 is able to to get
rid of potential write holes.
Signed-off-by: Liu Bo
---
fs/btrfs/ctree.h| 6 ++
fs/btrfs/ioctl.c| 48
While doing recovery, blocks are read from the raid5/6 disk one by
one, so this is adding readahead so that we can read at most 256
contiguous blocks in one read IO.
Signed-off-by: Liu Bo
---
fs/btrfs/raid56.c | 114 +++---
1
A typical write to the raid5/6 log needs three steps:
1) collect data/parity pages into the bio in io_unit;
2) submit the bio in io_unit;
3) writeback data/parity to raid array in end_io.
1) and 2) are protected within log->io_mutex, while 3) is not.
Since recovery needs to know the checkpoint
This introduces an option for 'btrfs device add' to add a device as
raid5/6 log at run time.
Signed-off-by: Liu Bo
---
cmds-device.c | 30 +-
ioctl.h | 3 +++
2 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/cmds-device.c
A raid5/6 log can be loaded while mounting a btrfs (which already has
a disk set up as raid5/6 log) or setting up a disk as raid5/6 log for
the first time.
It gets %journal_tail from super_block where it can read the first 4K
block and goes through the sanity checks, if it's valid, then go check
This is adding the ability to use a disk as raid5/6's stripe log (aka
journal), the primary goal is to fix write hole issue that is inherent
in raid56 setup.
In a typical raid5/6 setup, both full stripe write and a partial
stripe write will generate parity at the very end of writing, so after
This is adding recovery on raid5/6 log.
We've set a %journal_tail in super_block, which indicates the position
from where we need to replay data. So we scan the log and replay
valid meta/data/parity pairs until finding an invalid one. By
replaying, it simply reads data/parity from the raid5/6
Signed-off-by: Liu Bo
---
fs/btrfs/raid56.c | 2 ++
fs/btrfs/volumes.c | 7 ++-
fs/btrfs/volumes.h | 4
3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 2b91b95..c75766f 100644
--- a/fs/btrfs/raid56.c
+++
We've record journal_tail of raid5/6 log in super_block so that recovery
of raid5/6 log can scan from this position.
This teaches inspect-dump-super to acknowledge %journal_tail.
Signed-off-by: Liu Bo
---
cmds-inspect-dump-super.c | 2 ++
ctree.h | 6
We need to initialize the raid5/6 log after adding it, but we don't
want to race with concurrent writes. So we initialize it before
assigning the log pointer in %fs_info.
Signed-off-by: Liu Bo
---
fs/btrfs/disk-io.c | 2 +-
fs/btrfs/raid56.c | 18 --
We've put the flag BTRFS_DEV_RAID56_LOG in device->type, so we can
recognize the journal device of raid56 while reading the chunk tree.
Signed-off-by: Liu Bo
---
fs/btrfs/volumes.c | 12
1 file changed, 12 insertions(+)
diff --git a/fs/btrfs/volumes.c
The journal device (aka raid56 log) is not for chunk allocation, lets
skip it.
Signed-off-by: Liu Bo
---
fs/btrfs/volumes.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dafc541..5c50df7 100644
---
This aims to fix write hole issue on btrfs raid5/6 setup by adding a
separate disk as a journal (aka raid5/6 log), so that after unclean
shutdown we can make sure data and parity are consistent on the raid
array by replaying the journal.
The idea and the code are similar to the write-through mode
This is adding checksum to meta/data/parity resident on the raid5/6
log. So recovery now can verify checksum to see if anything inside
meta/data/parity has been changed.
If anything is wrong in meta block, we stops replaying data/parity at
that position, while if anything is wrong in data/parity
On 2017-08-01 12:50, pwm wrote:
I did a temporary patch of the snapraid code to start fallocate() from
the previous parity file size.
Like I said though, it's BTRFS that's misbehaving here, not snapraid.
I'm going to try to get some further discussion about this here on the
mailing list,and
I did a temporary patch of the snapraid code to start fallocate() from the
previous parity file size.
Finally have a snapraid sync up and running. Looks good, but will take
quite a while before I can try a scrub command to double-check everything.
Thanks for the help.
/Per W
On Tue, 1 Aug
2017-08-01 0:39 GMT+03:00 Ivan Sizov :
> 2017-08-01 0:17 GMT+03:00 Marc MERLIN :
>> On Tue, Aug 01, 2017 at 12:07:14AM +0300, Ivan Sizov wrote:
>>> 2017-07-09 10:57 GMT+03:00 Martin Steigerwald :
>>> > Hello Marc.
>>> >
>>> > Marc MERLIN -
On 2017-08-01 11:24, pwm wrote:
Yes, the test code is as below - trying to match what snapraid tries to do:
#include
#include
#include
#include
#include
#include
#include
int main() {
int fd = open("/mnt/snap_04/snapraid.parity",O_NOFOLLOW|O_RDWR);
if (fd < 0) {
On Tue, Aug 01, 2017 at 06:35:08PM +0800, Anand Jain wrote:
> We didn't copy fsid to struct super_block.s_uuid so Overlay disables
> index feature with btrfs as the lower FS.
>
> kernel: overlayfs: fs on '/lower' does not support file handles, falling back
> to index=off.
>
> Fix this by
Commit 38851cc19adb ("Btrfs: implement unlocked dio write") implemented
unlocked dio write, allowing multiple dio writers to write to non-overlapping,
and non-eof-extending regions. In doing so it also introduced a broken memory
barrier. It is broken due to 2 things:
1. Memory barriers _MUST_
Yes, the test code is as below - trying to match what snapraid tries
to do:
#include
#include
#include
#include
#include
#include
#include
int main() {
int fd = open("/mnt/snap_04/snapraid.parity",O_NOFOLLOW|O_RDWR);
if (fd < 0) {
printf("Failed opening parity file
On 8/1/17, Duncan <1i5t5.dun...@cox.net> wrote:
> Imran Geriskovan posted on Mon, 31 Jul 2017 22:32:39 +0200 as excerpted:
Now the init on /boot is a "19 lines" shell script, including lines
for keymap, hdparm, crytpsetup. And let's not forget this is possible
by a custom kernel and
On 2017-08-01 10:47, Austin S. Hemmelgarn wrote:
On 2017-08-01 10:39, pwm wrote:
Thanks for the links and suggestions.
I did try your suggestions but it didn't solve the underlying problem.
pwm@europium:~$ sudo btrfs balance start -v -dusage=20 /mnt/snap_04
Dumping filters: flags 0x1, state
On 2017-08-01 10:39, pwm wrote:
Thanks for the links and suggestions.
I did try your suggestions but it didn't solve the underlying problem.
pwm@europium:~$ sudo btrfs balance start -v -dusage=20 /mnt/snap_04
Dumping filters: flags 0x1, state 0x0, force is off
DATA (flags 0x2): balancing,
Thanks for the links and suggestions.
I did try your suggestions but it didn't solve the underlying problem.
pwm@europium:~$ sudo btrfs balance start -v -dusage=20 /mnt/snap_04
Dumping filters: flags 0x1, state 0x0, force is off
DATA (flags 0x2): balancing, usage=20
Done, had to relocate
> Peter, I don't think the filefrag is showing the correct
> fragmentation status of the file when the compression is used.
As reported on a previous message the output of 'filefrag -v'
which can be used to see what is going on:
filefrag /mnt/sde3/testfile
/mnt/sde3/testfile: 49287
OK. The problem was that the original subvolume had a "Received UUID".
This caused all subsequent snapshots to have the same Received UUID
which messes up Btrfs send | receive. Of course this means I must have
used btrfs send | receive to create that subvolume and then turned it
r/w at some
On Tue, Aug 1, 2017 at 2:43 AM, Leonidas Spyropoulos
wrote:
> Hi Duncan,
>
> Thanks for your answer
In general I think btrfs takes time proportional to the size of your
metadata to mount. Bigger and/or fragmented metadata leads to longer
mount times. My big backup fs with
Hi, Per,
Start here:
https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29
In your case, I'd suggest using "-dusage=20" to start with, as
it'll probably free up quite a lot of your existing allocation.
And this may also be of interest, in how to read the
I have a 10TB file system with a parity file for a snapraid. However, I
can suddenly not extend the parity file despite the file system only being
about 50% filled - I should have 5TB of unallocated space. When trying to
extend the parity file, fallocate() just returns ENOSPC, i.e. that the
> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-
> ow...@vger.kernel.org] On Behalf Of Konstantin V. Gavrilenko
> Sent: Tuesday, 1 August 2017 7:58 PM
> To: Peter Grandi
> Cc: Linux fs Btrfs
>
We didn't copy fsid to struct super_block.s_uuid so Overlay disables
index feature with btrfs as the lower FS.
kernel: overlayfs: fs on '/lower' does not support file handles, falling back
to index=off.
Fix this by publishing the fsid through struct super_block.s_uuid.
Signed-off-by: Anand
Peter, I don't think the filefrag is showing the correct fragmentation status
of the file when the compression is used.
At least the one that is installed by default in Ubuntu 16.04 - e2fsprogs |
1.42.13-1ubuntu1
So for example, fragmentation of compressed file is 320 times more then
Hi Duncan,
Thanks for your answer
On 01/08/17, Duncan wrote:
>
> If you're doing any snapshotting, you almost certainly want noatime, not
> the default relatime. Even without snapshotting and regardless of the
> filesystem, tho on btrfs it's a bigger factor due to COW, noatime is a
>
On Mon, Jul 31, 2017 at 03:00:53PM -0700, Justin Maggard wrote:
> Marc, do you have quotas enabled? IIRC, you're a send/receive user.
> The combination of quotas and btrfs receive can corrupt your
> filesystem, as shown by the xfstest I sent to the list a little while
> ago.
Thanks for checking.
On Sun, 30 Jul 2017 18:14:35 +0200
"marcel.cochem" wrote:
> I am pretty sure that not all data is lost as i can grep thorugh the
> 100 GB SSD partition. But my question is, if there is a tool to rescue
> all (intact) data and maybe have only a few corrupt files
On Mon, 31 Jul 2017 11:12:01 -0700
Liu Bo wrote:
> Superblock and chunk tree root is OK, looks like the header part of
> the tree root is now all-zero, but I'm unable to think of a btrfs bug
> which can lead to that (if there is, it is a serious enough one)
I see that the
76 matches
Mail list logo