Re: Filsystemkorruption i ext4?

2024-03-28 Thread Hans
Hi Jesper,

RAID 1 is mirroring. I suppose, a reason for the failure might be a timing 
problem. I do not know for sure, if yous system has got a real RAID-controller 
or if it is made by software.

The real controller should not produce write errors, however maybe at heavy 
load it might happen. I never used RAID 1 myself, as I am a fancy guy and am 
no friend of RAID 1. It is just, when there is an error on one drive, it is on 
the other, too.

My fancy solution was, using one drive and mirror this frequently every 30 
minutes using rsync. IMO doing so, I have several options:

1. If the harddrive is defective, I can boot the other one.

2. If the software is defective, I have 30 minutes, to discover the failure 
(every good logging system should alarm this in time)

3. I have a running backup available.

4. I can exchange the defective harddrive during the running system.

5. After exchange, i can examine, what happened (hardware failure, malware, 
whatever).

Many people will now laugh at me, but doing so, worked for me at best. So I 
reached an uptime of more than 700 days, but this might not be based on my 
work, but the work of all the debian developers!

As I said before, i am not very experienced with RAID 1, other people might 
know much more.

Personally I believe, RAID is mostly used with Windows, as Windows does not 
have these nice tools like rsync or syslog and all the things, that make linux 
and debian so great.

Have a nice eastern!

Best

Hans 

> Sorry - I should have left more of the previous mails quoted.  I have
> previously tested the RAID1 consistency (ok), fixed the file system
> (found 3 files with incorrect block count), and now also tested the
> RAM.And since it seems unlikely that it is a bug in ext4 (in Debian
> Bullseye), I don't quite understand how such an inconsistency can occur.
> Thanks for your response, Jesper
> 
> > If so, I suggest to boot a live system like Knoppix or similar, then run
> > your test by using
> > 
> > e2fsck -y /dev/sda1
> > 
> > or wherever your filesystem resides.
> > 
> > Please pay attention: If you have encrypted filesystems, then first open
> > the encryption, do NOT mount the filesystem and then check it, for
> > example:
> > 
> > cryptsetup luksOpen /dev/sda1 data1
> > 
> > then enter the password and now you can run
> > 
> > e2fsck -y /dev/mapper/data1
> > 
> > Note: the word "data1" is only an example, you can name it, whatever you
> > want like "space", "soap", "bullet", "henry" or whatever.
> > 
> > Hope this helps.
> > 
> > Best
> > 
> > Hans
> > 
> >> [Sorry - I accidentally sent this too quickly in an incomplete state.
> >> Second try here:]
> >> 
> >>> On Wed, Mar 20, 2024, 11:28 AM Jesper Dybdal
> >>> 
> >>>   wrote:
> >>>  I think I'll let memtest86+ run overnight one of the coming nights.
> >>>  
> >>>  Unless it is simply a RAM error, then it is a bit scary...
> >> 
> >> I've now let memtest86+ run for 9 hours, during which it did 14 passes
> >> of all its tests.  It found nothing wrong.
> >> 
> >> On 2024-03-20 22:58, Nicholas Geovanis wrote:
> >>> I have seen that a couple times, unlikely but possible. Maybe review
> >>> your RAM configuration too, ensure that the sticks are on the same
> >>> supported refresh rate and distributed across the slots in an approved
> >>> way.
> >> 
> >> There is only one RAM stick (of 16 GB), so there should be no problems
> >> of that kind.
> >> 
> >> I'm afraid I won't find an explanation of that file system corruption :-(
> >> 
> >> Thanks to Franco and Nicholas for your responses,
> >> Jesper






Re: Filsystemkorruption i ext4?

2024-03-28 Thread Jesper Dybdal

On 2024-03-28 15:02, Hans wrote:

Am Donnerstag, 28. März 2024, 14:49:37 CET schrieb Jesper Dybdal:
Hello,

memtest86+ is for testing RAM, but do you not want to test ext4 filesystem?


Sorry - I should have left more of the previous mails quoted.  I have 
previously tested the RAID1 consistency (ok), fixed the file system 
(found 3 files with incorrect block count), and now also tested the 
RAM.And since it seems unlikely that it is a bug in ext4 (in Debian 
Bullseye), I don't quite understand how such an inconsistency can occur. 
Thanks for your response, Jesper

If so, I suggest to boot a live system like Knoppix or similar, then run your
test by using

e2fsck -y /dev/sda1

or wherever your filesystem resides.

Please pay attention: If you have encrypted filesystems, then first open the
encryption, do NOT mount the filesystem and then check it, for example:

cryptsetup luksOpen /dev/sda1 data1

then enter the password and now you can run

e2fsck -y /dev/mapper/data1

Note: the word "data1" is only an example, you can name it, whatever you want
like "space", "soap", "bullet", "henry" or whatever.

Hope this helps.

Best

Hans




[Sorry - I accidentally sent this too quickly in an incomplete state.
Second try here:]


On Wed, Mar 20, 2024, 11:28 AM Jesper Dybdal

  wrote:
 I think I'll let memtest86+ run overnight one of the coming nights.
 
 Unless it is simply a RAM error, then it is a bit scary...

I've now let memtest86+ run for 9 hours, during which it did 14 passes
of all its tests.  It found nothing wrong.

On 2024-03-20 22:58, Nicholas Geovanis wrote:

I have seen that a couple times, unlikely but possible. Maybe review
your RAM configuration too, ensure that the sticks are on the same
supported refresh rate and distributed across the slots in an approved
way.

There is only one RAM stick (of 16 GB), so there should be no problems
of that kind.

I'm afraid I won't find an explanation of that file system corruption :-(

Thanks to Franco and Nicholas for your responses,
Jesper






--
Jesper Dybdal
https://www.dybdal.dk


Re: Filsystemkorruption i ext4?

2024-03-28 Thread Hans
Am Donnerstag, 28. März 2024, 14:49:37 CET schrieb Jesper Dybdal:
Hello,

memtest86+ is for testing RAM, but do you not want to test ext4 filesystem?

If so, I suggest to boot a live system like Knoppix or similar, then run your 
test by using

e2fsck -y /dev/sda1

or wherever your filesystem resides. 

Please pay attention: If you have encrypted filesystems, then first open the 
encryption, do NOT mount the filesystem and then check it, for example:

cryptsetup luksOpen /dev/sda1 data1

then enter the password and now you can run

e2fsck -y /dev/mapper/data1

Note: the word "data1" is only an example, you can name it, whatever you want 
like "space", "soap", "bullet", "henry" or whatever.

Hope this helps.

Best

Hans



> [Sorry - I accidentally sent this too quickly in an incomplete state. 
> Second try here:]
> 
> > On Wed, Mar 20, 2024, 11:28 AM Jesper Dybdal
> > 
> >  wrote:
> > I think I'll let memtest86+ run overnight one of the coming nights.
> > 
> > Unless it is simply a RAM error, then it is a bit scary...
> 
> I've now let memtest86+ run for 9 hours, during which it did 14 passes
> of all its tests.  It found nothing wrong.
> 
> On 2024-03-20 22:58, Nicholas Geovanis wrote:
> > I have seen that a couple times, unlikely but possible. Maybe review
> > your RAM configuration too, ensure that the sticks are on the same
> > supported refresh rate and distributed across the slots in an approved
> > way.
> 
> There is only one RAM stick (of 16 GB), so there should be no problems
> of that kind.
> 
> I'm afraid I won't find an explanation of that file system corruption :-(
> 
> Thanks to Franco and Nicholas for your responses,
> Jesper






Re: Filsystemkorruption i ext4?

2024-03-28 Thread Jesper Dybdal
[Sorry - I accidentally sent this too quickly in an incomplete state.  
Second try here:]


On Wed, Mar 20, 2024, 11:28 AM Jesper Dybdal 
 wrote:



I think I'll let memtest86+ run overnight one of the coming nights.

Unless it is simply a RAM error, then it is a bit scary...



I've now let memtest86+ run for 9 hours, during which it did 14 passes 
of all its tests.  It found nothing wrong.


On 2024-03-20 22:58, Nicholas Geovanis wrote:
I have seen that a couple times, unlikely but possible. Maybe review 
your RAM configuration too, ensure that the sticks are on the same 
supported refresh rate and distributed across the slots in an approved 
way.


There is only one RAM stick (of 16 GB), so there should be no problems 
of that kind.


I'm afraid I won't find an explanation of that file system corruption :-(

Thanks to Franco and Nicholas for your responses,
Jesper

--
Jesper Dybdal
https://www.dybdal.dk


Re: Filsystemkorruption i ext4?

2024-03-28 Thread Jesper Dybdal

On 2024-03-20 22:58, Nicholas Geovanis wrote:


On Wed, Mar 20, 2024, 11:28 AM Jesper Dybdal 
 wrote:


I have now done the following:
* Checked the RAID array - no problems found.
* Run fsck.  It found three cases of the block count being
incorrect.  I
don't know which the other two affected files are.
* Run one pass of memtest86+.  Nothing found.

So it seems not to be a problem with the disks.
A bug in ext4?  Well, ext4 has always done its job for me wihtout
problems.
A RAM error that memtest86+ did not find?  Possible.  Once upon a
time,
when you bought an ordinary pc, its RAM had ECC as a matter of
course;
unfortunately, that is not the case nowadays.

I think I'll let memtest86+ run overnight one of the coming nights.

Unless it is simply a RAM error, then it is a bit scary...


I've now let memtest86+ run for 8 hours, during which i did 14 passes of 
all its tests.  It found nothing wrong.
I have seen that a couple times, unlikely but possible. Maybe review 
your RAM configuration too, ensure that the sticks are on the same 
supported refresh rate and distributed across the slots in an approved 
way.


Regards,
Jesper

On 2024-03-19 21:47, Franco Martelli wrote:
> On 19/03/24 at 15:43, Jesper Dybdal wrote:
>
>>
>> My plan is to boot a rescue disk and mount that partition
read-only.
>> Then:
>> * If the file looks ok after reboot, then I'll strongly suspect
the
>> RAM - and run memtest.
>> * Otherwise, I'll have to run fsck and see what happens.
>>
>> kernel version:
>> root@nuser:~# uname -a
>> Linux nuser 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31)
>> x86_64 GNU/Linux
>>
>> The partition in question is a RAID 1 controlled by md.
>
> Another check you can perform it is on the RAID array, by
default it
> runs on the first Sunday of each month at 00:57. You should have
this
> file /etc/cron.d/mdadm that takes care to run this check monthly.
>
> Before you reboot, does it look OK /proc/mdstat ?
>

-- 
Jesper Dybdal

https://www.dybdal.dk





--
Jesper Dybdal
https://www.dybdal.dk


Re: Filsystemkorruption i ext4?

2024-03-20 Thread Nicholas Geovanis
On Wed, Mar 20, 2024, 11:28 AM Jesper Dybdal 
wrote:

> I have now done the following:
> * Checked the RAID array - no problems found.
> * Run fsck.  It found three cases of the block count being incorrect.  I
> don't know which the other two affected files are.
> * Run one pass of memtest86+.  Nothing found.
>
> So it seems not to be a problem with the disks.
> A bug in ext4?  Well, ext4 has always done its job for me wihtout problems.
> A RAM error that memtest86+ did not find?  Possible.  Once upon a time,
> when you bought an ordinary pc, its RAM had ECC as a matter of course;
> unfortunately, that is not the case nowadays.
>
> I think I'll let memtest86+ run overnight one of the coming nights.
>
> Unless it is simply a RAM error, then it is a bit scary...
>

I have seen that a couple times, unlikely but possible. Maybe review your
RAM configuration too, ensure that the sticks are on the same supported
refresh rate and distributed across the slots in an approved way.

Regards,
> Jesper
>
> On 2024-03-19 21:47, Franco Martelli wrote:
> > On 19/03/24 at 15:43, Jesper Dybdal wrote:
> >
> >>
> >> My plan is to boot a rescue disk and mount that partition read-only.
> >> Then:
> >> * If the file looks ok after reboot, then I'll strongly suspect the
> >> RAM - and run memtest.
> >> * Otherwise, I'll have to run fsck and see what happens.
> >>
> >> kernel version:
> >> root@nuser:~# uname -a
> >> Linux nuser 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31)
> >> x86_64 GNU/Linux
> >>
> >> The partition in question is a RAID 1 controlled by md.
> >
> > Another check you can perform it is on the RAID array, by default it
> > runs on the first Sunday of each month at 00:57. You should have this
> > file /etc/cron.d/mdadm that takes care to run this check monthly.
> >
> > Before you reboot, does it look OK /proc/mdstat ?
> >
>
> --
> Jesper Dybdal
> https://www.dybdal.dk
>
>
>
>


Re: Filsystemkorruption i ext4?

2024-03-20 Thread Franco Martelli

On 20/03/24 at 09:15, Jesper Dybdal wrote:

[Sorry for the accidental Danish-language subject line :-( ]

On 2024-03-19 21:47, Franco Martelli wrote:

On 19/03/24 at 15:43, Jesper Dybdal wrote:



My plan is to boot a rescue disk and mount that partition read-only. 
Then:
* If the file looks ok after reboot, then I'll strongly suspect the 
RAM - and run memtest.

* Otherwise, I'll have to run fsck and see what happens.

kernel version:
root@nuser:~# uname -a
Linux nuser 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) 
x86_64 GNU/Linux


The partition in question is a RAID 1 controlled by md.


Another check you can perform it is on the RAID array, by default it 
runs on the first Sunday of each month at 00:57. You should have this 
file /etc/cron.d/mdadm that takes care to run this check monthly.

Good idea!  That should of course be done first.  It's running now.


Before you reboot, does it look OK /proc/mdstat ?

Yes, it seems ok.


I would suggest you to mount the filesystem yes read-only but also with 
the noload option ( … -o ro,noload … ) see "man mount" for a brief 
explanation.


Cheers

--
Franco Martelli



Re: Filsystemkorruption i ext4?

2024-03-20 Thread Jesper Dybdal

I have now done the following:
* Checked the RAID array - no problems found.
* Run fsck.  It found three cases of the block count being incorrect.  I 
don't know which the other two affected files are.

* Run one pass of memtest86+.  Nothing found.

So it seems not to be a problem with the disks.
A bug in ext4?  Well, ext4 has always done its job for me wihtout problems.
A RAM error that memtest86+ did not find?  Possible.  Once upon a time, 
when you bought an ordinary pc, its RAM had ECC as a matter of course; 
unfortunately, that is not the case nowadays.


I think I'll let memtest86+ run overnight one of the coming nights.

Unless it is simply a RAM error, then it is a bit scary...

Regards,
Jesper

On 2024-03-19 21:47, Franco Martelli wrote:

On 19/03/24 at 15:43, Jesper Dybdal wrote:



My plan is to boot a rescue disk and mount that partition read-only. 
Then:
* If the file looks ok after reboot, then I'll strongly suspect the 
RAM - and run memtest.

* Otherwise, I'll have to run fsck and see what happens.

kernel version:
root@nuser:~# uname -a
Linux nuser 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) 
x86_64 GNU/Linux


The partition in question is a RAID 1 controlled by md.


Another check you can perform it is on the RAID array, by default it 
runs on the first Sunday of each month at 00:57. You should have this 
file /etc/cron.d/mdadm that takes care to run this check monthly.


Before you reboot, does it look OK /proc/mdstat ?



--
Jesper Dybdal
https://www.dybdal.dk





Re: Filsystemkorruption i ext4?

2024-03-20 Thread Jesper Dybdal

[Sorry for the accidental Danish-language subject line :-( ]

On 2024-03-19 21:47, Franco Martelli wrote:

On 19/03/24 at 15:43, Jesper Dybdal wrote:



My plan is to boot a rescue disk and mount that partition read-only. 
Then:
* If the file looks ok after reboot, then I'll strongly suspect the 
RAM - and run memtest.

* Otherwise, I'll have to run fsck and see what happens.

kernel version:
root@nuser:~# uname -a
Linux nuser 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) 
x86_64 GNU/Linux


The partition in question is a RAID 1 controlled by md.


Another check you can perform it is on the RAID array, by default it 
runs on the first Sunday of each month at 00:57. You should have this 
file /etc/cron.d/mdadm that takes care to run this check monthly.

Good idea!  That should of course be done first.  It's running now.


Before you reboot, does it look OK /proc/mdstat ?

Yes, it seems ok.

Thanks,
Jesper

--
Jesper Dybdal
https://www.dybdal.dk





Re: Filsystemkorruption i ext4?

2024-03-19 Thread Franco Martelli

On 19/03/24 at 15:43, Jesper Dybdal wrote:



My plan is to boot a rescue disk and mount that partition read-only. Then:
* If the file looks ok after reboot, then I'll strongly suspect the RAM 
- and run memtest.

* Otherwise, I'll have to run fsck and see what happens.

kernel version:
root@nuser:~# uname -a
Linux nuser 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 
GNU/Linux


The partition in question is a RAID 1 controlled by md.


Another check you can perform it is on the RAID array, by default it 
runs on the first Sunday of each month at 00:57. You should have this 
file /etc/cron.d/mdadm that takes care to run this check monthly.


Before you reboot, does it look OK /proc/mdstat ?

--
Franco Martelli