Re: [Bacula-users] Index for Table File is corrupt

Phil Stracchino Sat, 27 May 2023 11:25:39 -0700

On 5/27/23 08:17, Dr. Thorsten Brandau wrote:

CHECK TABLE File ;
+-------------+-------+----------+---------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+-------------+-------+----------+---------------------------------------------------+
| bacula.File | check | Warning | InnoDB: The B-tree of index PRIMARYis corrupted. || bacula.File | check | error | Corrupt |
+-------------+-------+----------+---------------------------------------------------+
And I cannot recreate the indicies as previously with the same errormessage "ERROR 1030 (HY000): Got error 106 "Transport endpoint isalready connected" from storage engine InnoDB
When I check the indicies:

show indexes from File;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |Collation | Cardinality | Sub_part | Packed | Null | Index_type |Comment | Index_comment | Ignored |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| File | 0 | PRIMARY | 1 | FileId | A | 46738695 | NULL | NULL | | BTREE | | | NO |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
1 row in set (0,001 sec)

Well, I tried a bit more and found some hints to remove theAUTO_INCREMENT form the FileId field and readding it. However, I couldremove it, but not added again (ERROR 1034 (HY000): Index for table'File' is corrupt; try to repair it)
So I am pretty much on the verge of having the database killed.

OK Doc, there's one other thing we can try here, and that is a manualtable rebuild. You've already got the SHOW CREATE TABLE from the oldtable, which was to verify that it was properly created as an InnoDBtable, and you've got lots of available storage.


So first create a new empty table:

CREATE TABLE `File_NEW` (
  `FileId` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `FileIndex` int(11) DEFAULT 0,
  `JobId` int(10) unsigned NOT NULL,
  `PathId` int(10) unsigned NOT NULL,
  `Filename` blob NOT NULL,
  `DeltaSeq` smallint(5) unsigned DEFAULT 0,
  `MarkId` int(10) unsigned DEFAULT 0,
  `LStat` tinyblob NOT NULL,
  `MD5` tinyblob DEFAULT NULL,
  PRIMARY KEY (`FileId`)
) ENGINE=InnoDB;

Then we need to copy the rows from the damaged table into it. With theindexes damaged this will be SLOW, but should still be doable. You haveabout 70 million rows, vs. my 40 million.



Here's how I would do it:


DROP PROCEDURE IF EXISTS moveFileRecords;

DELIMITER //

CREATE PROCEDURE moveFileRecords(IN _startOffset INT UNSIGNED, IN_batchSize INT UNSIGNED)

BEGIN
    DECLARE _copied, _moved, _offset INT UNSIGNED DEFAULT 0;

    IF (_startOffset IS NOT NULL) THEN
        SET _offset = _startOffset;
    END IF;

    REPEAT
        INSERT IGNORE INTO File_NEW
        SELECT * FROM File
        ORDER BY FileId
        LIMIT _batchSize
        OFFSET _offset;

        SET _copied = ROW_COUNT();
        SET _moved = _moved + _copied;
        SET _offset = _offset + _batchSize;

        SELECT _moved AS 'Total Rows Imported',
            _offset AS 'Current Offset',
            _copied AS 'This batch';

        UNTIL _copied < _batchSize
    END REPEAT;
END //

DELIMITER ;



MySQL 127.0.0.1> CALL moveFileRecords(0,10000);
+---------------------+----------------+------------+
| Total Rows Imported | Current Offset | This batch |
+---------------------+----------------+------------+
|               10000 |          10000 |      10000 |
+---------------------+----------------+------------+
1 row in set (0.562 sec)

...

+---------------------+----------------+------------+
| Total Rows Imported | Current Offset | This batch |
+---------------------+----------------+------------+
|              370000 |         370000 |      10000 |
+---------------------+----------------+------------+
1 row in set (43.798 sec)

...

I'd probably start with a batch size of 10k to verify it's working, thengo up to 100k. You can interrupt it at any time and skip already-copiedrows by passing the last 'Current Offset' as _startOffset.



Now, DISCLAIMERS:

This is going to be slow, because the PRIMARY key is borked. And it'sgoing to slow down as we progress through the table because each passwill have to skip over more and more records. It may be VERY slow.(Also, your performance WILL differ from mine, because we have verydifferent hardware and database configuration, I'm doing this on a3-node synchronous cluster.)



There's three reasons for doing it this way:

1. You can see that it's visibly making progress, not just going out tolunch2. Copying the rows in small batches reduces the memory required foreach copy operation. Copying 70 million rows at once is a hugetransaction, and if it fails for some reason in the middle, all thatwork is lost.3. If the copy is going to fail for any reason, we want to copy as manyrows as possible BEFORE it fails *and know where it failed*.

If it fails partway through, say at 35000000, you MAY be able to resumeby passing it a starting offset of 35100000 and see if you can resumeafter the failure.

I'm also not deleting the copied rows from the source table, for tworeasons:1. Bulk deletes in MySQL are slow and it would cost us more time thanit would save2. We want to preserve all of the original data until we're sure we'vegot it all.



If you prefer, you can TRY doing a single one-shot batch copy first:

INSERT INTO File_NEW SELECT * FROM File;

But this may run for hours with no visible output. You can monitor itfrom another MySQL shell using SHOW PROCESSLIST to verify that it isstill running.

Once you can (hopefully) get one or the other method to complete withouterroring out:


RENAME TABLE File TO File_DAMAGED, File_NEW TO File;

And then if I were you, I'd run a dbcheck admin job. Then restart yourDirector and see how it all looks.

I can't guarantee this will work, but there's *clearly* something screwyin your DB at the tablespace level or lower, and I can't tell withoutpoking around in your database myself what it is. It's possible therewas a corrupted sector where the index is stored. Yeah, I know, RAID,checksums, not supposed to happen; there is a thing that storage adminslose sleep over called the RAID5 write hole. It's a way that silentdata corruption can happen on a RAID5 array.

(Speaking of which, if you have the ability to do so, if I were you I'dtry to provision a separate RAID10, *NOT* RAID5, volume for yourdatabase storage. When it comes to large database operations, I/Othroughput is next to godliness.)




--
  Phil Stracchino
  Babylon Communications
  ph...@caerllewys.net
  p...@co.ordinate.org
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958



_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Index for Table File is corrupt

Reply via email to