On 5/27/23 08:17, Dr. Thorsten Brandau wrote:
CHECK TABLE File ;
+-------------+-------+----------+---------------------------------------------------+
| Table | Op | Msg_type | Msg_text
|
+-------------+-------+----------+---------------------------------------------------+
| bacula.File | check | Warning | InnoDB: The B-tree of index PRIMARY
is corrupted. |
| bacula.File | check | error | Corrupt
|
+-------------+-------+----------+---------------------------------------------------+
And I cannot recreate the indicies as previously with the same error
message "ERROR 1030 (HY000): Got error 106 "Transport endpoint is
already connected" from storage engine InnoDB
When I check the indicies:
show indexes from File;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null | Index_type |
Comment | Index_comment | Ignored |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| File | 0 | PRIMARY | 1 | FileId | A
| 46738695 | NULL | NULL | | BTREE |
| | NO |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
1 row in set (0,001 sec)
Well, I tried a bit more and found some hints to remove the
AUTO_INCREMENT form the FileId field and readding it. However, I could
remove it, but not added again (ERROR 1034 (HY000): Index for table
'File' is corrupt; try to repair it)
So I am pretty much on the verge of having the database killed.
OK Doc, there's one other thing we can try here, and that is a manual
table rebuild. You've already got the SHOW CREATE TABLE from the old
table, which was to verify that it was properly created as an InnoDB
table, and you've got lots of available storage.
So first create a new empty table:
CREATE TABLE `File_NEW` (
`FileId` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`FileIndex` int(11) DEFAULT 0,
`JobId` int(10) unsigned NOT NULL,
`PathId` int(10) unsigned NOT NULL,
`Filename` blob NOT NULL,
`DeltaSeq` smallint(5) unsigned DEFAULT 0,
`MarkId` int(10) unsigned DEFAULT 0,
`LStat` tinyblob NOT NULL,
`MD5` tinyblob DEFAULT NULL,
PRIMARY KEY (`FileId`)
) ENGINE=InnoDB;
Then we need to copy the rows from the damaged table into it. With the
indexes damaged this will be SLOW, but should still be doable. You have
about 70 million rows, vs. my 40 million.
Here's how I would do it:
DROP PROCEDURE IF EXISTS moveFileRecords;
DELIMITER //
CREATE PROCEDURE moveFileRecords(IN _startOffset INT UNSIGNED, IN
_batchSize INT UNSIGNED)
BEGIN
DECLARE _copied, _moved, _offset INT UNSIGNED DEFAULT 0;
IF (_startOffset IS NOT NULL) THEN
SET _offset = _startOffset;
END IF;
REPEAT
INSERT IGNORE INTO File_NEW
SELECT * FROM File
ORDER BY FileId
LIMIT _batchSize
OFFSET _offset;
SET _copied = ROW_COUNT();
SET _moved = _moved + _copied;
SET _offset = _offset + _batchSize;
SELECT _moved AS 'Total Rows Imported',
_offset AS 'Current Offset',
_copied AS 'This batch';
UNTIL _copied < _batchSize
END REPEAT;
END //
DELIMITER ;
MySQL 127.0.0.1> CALL moveFileRecords(0,10000);
+---------------------+----------------+------------+
| Total Rows Imported | Current Offset | This batch |
+---------------------+----------------+------------+
| 10000 | 10000 | 10000 |
+---------------------+----------------+------------+
1 row in set (0.562 sec)
...
+---------------------+----------------+------------+
| Total Rows Imported | Current Offset | This batch |
+---------------------+----------------+------------+
| 370000 | 370000 | 10000 |
+---------------------+----------------+------------+
1 row in set (43.798 sec)
...
I'd probably start with a batch size of 10k to verify it's working, then
go up to 100k. You can interrupt it at any time and skip already-copied
rows by passing the last 'Current Offset' as _startOffset.
Now, DISCLAIMERS:
This is going to be slow, because the PRIMARY key is borked. And it's
going to slow down as we progress through the table because each pass
will have to skip over more and more records. It may be VERY slow.
(Also, your performance WILL differ from mine, because we have very
different hardware and database configuration, I'm doing this on a
3-node synchronous cluster.)
There's three reasons for doing it this way:
1. You can see that it's visibly making progress, not just going out to
lunch
2. Copying the rows in small batches reduces the memory required for
each copy operation. Copying 70 million rows at once is a huge
transaction, and if it fails for some reason in the middle, all that
work is lost.
3. If the copy is going to fail for any reason, we want to copy as many
rows as possible BEFORE it fails *and know where it failed*.
If it fails partway through, say at 35000000, you MAY be able to resume
by passing it a starting offset of 35100000 and see if you can resume
after the failure.
I'm also not deleting the copied rows from the source table, for two
reasons:
1. Bulk deletes in MySQL are slow and it would cost us more time than
it would save
2. We want to preserve all of the original data until we're sure we've
got it all.
If you prefer, you can TRY doing a single one-shot batch copy first:
INSERT INTO File_NEW SELECT * FROM File;
But this may run for hours with no visible output. You can monitor it
from another MySQL shell using SHOW PROCESSLIST to verify that it is
still running.
Once you can (hopefully) get one or the other method to complete without
erroring out:
RENAME TABLE File TO File_DAMAGED, File_NEW TO File;
And then if I were you, I'd run a dbcheck admin job. Then restart your
Director and see how it all looks.
I can't guarantee this will work, but there's *clearly* something screwy
in your DB at the tablespace level or lower, and I can't tell without
poking around in your database myself what it is. It's possible there
was a corrupted sector where the index is stored. Yeah, I know, RAID,
checksums, not supposed to happen; there is a thing that storage admins
lose sleep over called the RAID5 write hole. It's a way that silent
data corruption can happen on a RAID5 array.
(Speaking of which, if you have the ability to do so, if I were you I'd
try to provision a separate RAID10, *NOT* RAID5, volume for your
database storage. When it comes to large database operations, I/O
throughput is next to godliness.)
--
Phil Stracchino
Babylon Communications
ph...@caerllewys.net
p...@co.ordinate.org
Landline: +1.603.293.8485
Mobile: +1.603.998.6958
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users