Re: [CentOS] Problem with mdadm, raid1 and automatically adds any disk to raid

2019-02-24 Thread Jobst Schmalenbach
On Mon, Feb 25, 2019 at 06:50:11AM +0100, Simon Matter via CentOS 
(centos@centos.org) wrote:
> > Hi.
> >
> >   dd if=/dev/zero of=/dev/sdX bs=512 seek=$(($(blockdev --getsz
> > /dev/sdX)-1024)) count=1024
> 
> I didn't check but are you really sure you're cleaning up the end of the
> drive? Maybe you should clean the end of every partition first because
> metadata may be written there.

Mmmmhhh, not sure.
I run fdisk on it, basically re-creating everything from the start.

The "trying to re-create the MDX's" happens when I use "w" in fdisk.
As soon as I hit the "w" it starts re-creating the MDx!

Thats the annoying part.

[snip]
> > No matter what I do as soon as I hit the "w" in fdisk systemd tries to
> > assemble the array again without letting me to decide what to do.
> 
> 

I am not ;-), it's @ work.


Jobst


-- 
You seem (in my (humble) opinion (which doesn.t mean much)) to be (or possibly 
could be) more of a Lisp programmer (but I could be (and probably am) wrong)

  | |0| |   Jobst Schmalenbach, General Manager
  | | |0|   Barrett & Sales Essentials
  |0|0|0|   +61 3 9533 , POBox 277, Caulfield South, 3162, Australia
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Problem with mdadm, raid1 and automatically adds any disk to raid

2019-02-24 Thread Simon Matter via CentOS
> Hi.
>
> CENTOS 7.6.1810, fresh install - use this as a base to create/upgrade
> new/old machines.
>
> I was trying to setup two disks as a RAID1 array, using these lines
>
>   mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/sdb1
> /dev/sdc1
>   mdadm --create --verbose /dev/md1 --level=0 --raid-devices=2 /dev/sdb2
> /dev/sdc2
>   mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/sdb3
> /dev/sdc3
>
> then I did a lsblk and realized that I used --level=0 instead of --level=1
> (spelling mistake)
> The SIZE was reported double as I created a striped set by mistake, yet I
> wanted the mirrored.
>
> Here starts my problem, I cannot get rid of the /dev/mdX no matter what I
> do (try to do).
>
> I tried to delete the MDX, I removed the disks by failing them, then
> removing each array md0, md1 and md2.
> I also did
>
>   dd if=/dev/zero of=/dev/sdX bs=512 seek=$(($(blockdev --getsz
> /dev/sdX)-1024)) count=1024

I didn't check but are you really sure you're cleaning up the end of the
drive? Maybe you should clean the end of every partition first because
metadata may be written there.

>   dd if=/dev/zero of=/dev/sdX bs=512 count=1024
>   mdadm --zero-superblock /dev/sdX
>
> Then I wiped each partition of the drives using fdisk.
>
> Now every time I start fdisk to setup a new set of partitions I see in
> /var/log/messages as soon as I hit "W" in fdisk:
>
>   Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives
> before activating degraded array md2..
>   Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives
> before activating degraded array md1..
>   Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives
> before activating degraded array md0..
>   Feb 25 15:38:32 webber kernel: md/raid1:md0: active with 1 out of 2
> mirrors
>   Feb 25 15:38:32 webber kernel: md0: detected capacity change from 0 to
> 5363466240
>   Feb 25 15:39:02 webber systemd: Created slice
> system-mdadm\x2dlast\x2dresort.slice.
>   Feb 25 15:39:02 webber systemd: Starting Activate md array md1 even
> though degraded...
>   Feb 25 15:39:02 webber systemd: Starting Activate md array md2 even
> though degraded...
>   Feb 25 15:39:02 webber kernel: md/raid1:md1: active with 0 out of 2
> mirrors
>   Feb 25 15:39:02 webber kernel: md1: failed to create bitmap (-5)
>   Feb 25 15:39:02 webber mdadm: mdadm: failed to start array /dev/md/1:
> Input/output error
>   Feb 25 15:39:02 webber systemd: mdadm-last-resort@md1.service: main
> process exited, code=exited, status=1/FAILURE
>
> I check /proc/mdstat and sure enough, there it is trying to assemble an
> Array I DID NOT TOLD IT TO DO.
>
> I do NOT WANT this to happen, it creates the same "SHIT" (the incorrect
> array) over and over again (systemd frustration).

Noo, you're wiping it wrong :-)

> So I tried to delete them again, wiped them again, killed processes, wiped
> disks.
>
> No matter what I do as soon as I hit the "w" in fdisk systemd tries to
> assemble the array again without letting me to decide what to do.


Nothing easier than that, just terminate systemd while doing the disk
management and restart it after you're done. BTW, PID is 1.


Seriously, there is certainly some systemd unit you may be able to
deactivate before doing such things. However, I don't know which one it
is.

I've been fighting a similar crap: On HPE servers when running
cciss_vol_status through the disk monitoring system, whenever
cciss_vol_status is run and reports hardware RAID status, systemd scans
all partition tables and tries to detect LVM2 devices and whatever. Kernel
log is just filled with useless scans and I have no idea how to get rid of
it. Nice new systemd world.

Regards,
Simon

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Problem with mdadm, raid1 and automatically adds any disk to raid

2019-02-24 Thread Jobst Schmalenbach
Hi.

CENTOS 7.6.1810, fresh install - use this as a base to create/upgrade new/old 
machines.

I was trying to setup two disks as a RAID1 array, using these lines

  mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/sdb1 
/dev/sdc1
  mdadm --create --verbose /dev/md1 --level=0 --raid-devices=2 /dev/sdb2 
/dev/sdc2
  mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/sdb3 
/dev/sdc3

then I did a lsblk and realized that I used --level=0 instead of --level=1 
(spelling mistake)
The SIZE was reported double as I created a striped set by mistake, yet I 
wanted the mirrored.

Here starts my problem, I cannot get rid of the /dev/mdX no matter what I do 
(try to do).

I tried to delete the MDX, I removed the disks by failing them, then removing 
each array md0, md1 and md2.
I also did

  dd if=/dev/zero of=/dev/sdX bs=512 seek=$(($(blockdev --getsz 
/dev/sdX)-1024)) count=1024
  dd if=/dev/zero of=/dev/sdX bs=512 count=1024
  mdadm --zero-superblock /dev/sdX

Then I wiped each partition of the drives using fdisk.

Now every time I start fdisk to setup a new set of partitions I see in 
/var/log/messages as soon as I hit "W" in fdisk:

  Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before 
activating degraded array md2..
  Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before 
activating degraded array md1..
  Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before 
activating degraded array md0..
  Feb 25 15:38:32 webber kernel: md/raid1:md0: active with 1 out of 2 mirrors
  Feb 25 15:38:32 webber kernel: md0: detected capacity change from 0 to 
5363466240
  Feb 25 15:39:02 webber systemd: Created slice 
system-mdadm\x2dlast\x2dresort.slice.
  Feb 25 15:39:02 webber systemd: Starting Activate md array md1 even though 
degraded...
  Feb 25 15:39:02 webber systemd: Starting Activate md array md2 even though 
degraded...
  Feb 25 15:39:02 webber kernel: md/raid1:md1: active with 0 out of 2 mirrors
  Feb 25 15:39:02 webber kernel: md1: failed to create bitmap (-5)
  Feb 25 15:39:02 webber mdadm: mdadm: failed to start array /dev/md/1: 
Input/output error
  Feb 25 15:39:02 webber systemd: mdadm-last-resort@md1.service: main process 
exited, code=exited, status=1/FAILURE

I check /proc/mdstat and sure enough, there it is trying to assemble an Array I 
DID NOT TOLD IT TO DO.

I do NOT WANT this to happen, it creates the same "SHIT" (the incorrect array) 
over and over again (systemd frustration).
So I tried to delete them again, wiped them again, killed processes, wiped 
disks.

No matter what I do as soon as I hit the "w" in fdisk systemd tries to assemble 
the array again without letting me to decide what to do.


Help!
Jobst



-- 
windoze 98:  useless extension to a minor patch release for 32-bit 
extensions and a graphical shell for a 16-bit patch to an 8-bit operating 
system originally coded for a 4-bit microprocessor, written by a 2-bit company 
that can't stand for 1 bit of competition!

  | |0| |   Jobst Schmalenbach, General Manager
  | | |0|   Barrett & Sales Essentials
  |0|0|0|   +61 3 9533 , POBox 277, Caulfield South, 3162, Australia
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Nvme m.2 disk problem

2019-02-24 Thread Alessandro Baggi

Hi list,
I'm running Centos 7.6 on an Corsair Force MP500 120 GB. Root fs is ext4 
and this drive is ~1 year old.

System works very well except on boot.
During boot process I got always a file system check on nvme drive.

Running smartctl on this drive I got this:


=== START OF SMART DATA SECTION === 




SMART overall-health self-assessment test result: PASSED 








SMART/Health Information (NVMe Log 0x02, NSID 0x1) 




Critical Warning:   0x00 




Temperature:41 Celsius 




Available Spare:100% 




Available Spare Threshold:  1% 




Percentage Used:1% 




Data Units Read:5,355,595 [2,74 TB] 




Data Units Written: 5,826,517 [2,98 TB] 




Host Read Commands: 67,978,550 




Host Write Commands:75,422,898 




Controller Busy Time:   32,863 




Power Cycles:   811 




Power On Hours: 2,813
Unsafe Shutdowns:   317
Media and Data Integrity Errors:0
Error Information Log Entries:  177
Warning  Comp. Temperature Time:0
Critical Comp. Temperature Time:0
Temperature Sensor 2:   77 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc  LBA  NSIDVS
  0177 0  0x0014  0x4004  - 8796109799680 1 -
  1176 0  0x0019  0x4004  - 8796109799680 1 -
  2175 0  0x001a  0x4004  - 8796109799680 1 -
  3174 0  0x0005  0x4004  - 8796109799680 1 -
  4173 0  0x000c  0x4004  - 8796109799680 1 -
  5172 0  0x0019  0x4004  - 8796109799680 1 -
  6171 0  0x001d  0x4004  - 8796109799680 1 -
  7170 0  0x0014  0x4004  - 8796109799680 1 -
  8169 0  0x0011  0x4004  - 8796109799680 1 -
  9168 0  0x000f  0x4004  - 8796109799680 1 -
 10167 0  0x  0x4004  - 8796109799680 1 -
 11166 0  0x0006  0x4004  - 8796109799680 1 -
 12165 0  0x0008  0x4004  - 8796109799680 1 -
 13164 0  0x000e  0x4004  - 8796109799680 1 -
 14163 0  0x0008  0x4004  - 8796109799680 1 -
 15162 0  0x0006  0x4004  - 8796109799680 1 -
... (48 entries not shown)


I noticed that Unsafe shutdowns increased rapidly and I don't know why 
there is an unsafe shutdown. Every 3/4 boot this value is increased by 1 
and I don't know why.


I can't find any errors on system logs.

Can someone point me in the right direction?

Thanks in advance.

Alessandro.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos