On 5/27/23 08:17, Dr. Thorsten Brandau wrote:

CHECK TABLE File ;

+-------------+-------+----------+---------------------------------------------------+
| Table       | Op    | Msg_type | Msg_text                                          |
+-------------+-------+----------+---------------------------------------------------+
| bacula.File | check | Warning  | InnoDB: The B-tree of index PRIMARY is corrupted. | | bacula.File | check | error    | Corrupt                                           |
+-------------+-------+----------+---------------------------------------------------+

And I cannot recreate the indicies as previously with the same error message "ERROR 1030 (HY000): Got error 106 "Transport endpoint is already connected" from storage engine InnoDB

When I check the indicies:

show indexes from File;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Ignored |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| File  |          0 | PRIMARY  |            1 | FileId      | A         |    46738695 |     NULL | NULL   |      | BTREE      |         |               | NO      |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
1 row in set (0,001 sec)


Well, I tried a bit more and found some hints to remove the AUTO_INCREMENT form the FileId field and readding it. However, I could remove it, but not added again (ERROR 1034 (HY000): Index for table 'File' is corrupt; try to repair it)

So I am pretty much on the verge of having the database killed.


OK Doc, there's one other thing we can try here, and that is a manual table rebuild. You've already got the SHOW CREATE TABLE from the old table, which was to verify that it was properly created as an InnoDB table, and you've got lots of available storage.

So first create a new empty table:

CREATE TABLE `File_NEW` (
  `FileId` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `FileIndex` int(11) DEFAULT 0,
  `JobId` int(10) unsigned NOT NULL,
  `PathId` int(10) unsigned NOT NULL,
  `Filename` blob NOT NULL,
  `DeltaSeq` smallint(5) unsigned DEFAULT 0,
  `MarkId` int(10) unsigned DEFAULT 0,
  `LStat` tinyblob NOT NULL,
  `MD5` tinyblob DEFAULT NULL,
  PRIMARY KEY (`FileId`)
) ENGINE=InnoDB;


Then we need to copy the rows from the damaged table into it. With the indexes damaged this will be SLOW, but should still be doable. You have about 70 million rows, vs. my 40 million.


Here's how I would do it:


DROP PROCEDURE IF EXISTS moveFileRecords;

DELIMITER //

CREATE PROCEDURE moveFileRecords(IN _startOffset INT UNSIGNED, IN _batchSize INT UNSIGNED)
BEGIN
    DECLARE _copied, _moved, _offset INT UNSIGNED DEFAULT 0;

    IF (_startOffset IS NOT NULL) THEN
        SET _offset = _startOffset;
    END IF;

    REPEAT
        INSERT IGNORE INTO File_NEW
        SELECT * FROM File
        ORDER BY FileId
        LIMIT _batchSize
        OFFSET _offset;

        SET _copied = ROW_COUNT();
        SET _moved = _moved + _copied;
        SET _offset = _offset + _batchSize;

        SELECT _moved AS 'Total Rows Imported',
            _offset AS 'Current Offset',
            _copied AS 'This batch';

        UNTIL _copied < _batchSize
    END REPEAT;
END //

DELIMITER ;



MySQL 127.0.0.1> CALL moveFileRecords(0,10000);
+---------------------+----------------+------------+
| Total Rows Imported | Current Offset | This batch |
+---------------------+----------------+------------+
|               10000 |          10000 |      10000 |
+---------------------+----------------+------------+
1 row in set (0.562 sec)

...

+---------------------+----------------+------------+
| Total Rows Imported | Current Offset | This batch |
+---------------------+----------------+------------+
|              370000 |         370000 |      10000 |
+---------------------+----------------+------------+
1 row in set (43.798 sec)

...




I'd probably start with a batch size of 10k to verify it's working, then go up to 100k. You can interrupt it at any time and skip already-copied rows by passing the last 'Current Offset' as _startOffset.


Now, DISCLAIMERS:
This is going to be slow, because the PRIMARY key is borked. And it's going to slow down as we progress through the table because each pass will have to skip over more and more records. It may be VERY slow. (Also, your performance WILL differ from mine, because we have very different hardware and database configuration, I'm doing this on a 3-node synchronous cluster.)


There's three reasons for doing it this way:
1. You can see that it's visibly making progress, not just going out to lunch 2. Copying the rows in small batches reduces the memory required for each copy operation. Copying 70 million rows at once is a huge transaction, and if it fails for some reason in the middle, all that work is lost. 3. If the copy is going to fail for any reason, we want to copy as many rows as possible BEFORE it fails *and know where it failed*.

If it fails partway through, say at 35000000, you MAY be able to resume by passing it a starting offset of 35100000 and see if you can resume after the failure.

I'm also not deleting the copied rows from the source table, for two reasons: 1. Bulk deletes in MySQL are slow and it would cost us more time than it would save 2. We want to preserve all of the original data until we're sure we've got it all.


If you prefer, you can TRY doing a single one-shot batch copy first:

INSERT INTO File_NEW SELECT * FROM File;

But this may run for hours with no visible output. You can monitor it from another MySQL shell using SHOW PROCESSLIST to verify that it is still running.


Once you can (hopefully) get one or the other method to complete without erroring out:

RENAME TABLE File TO File_DAMAGED, File_NEW TO File;

And then if I were you, I'd run a dbcheck admin job. Then restart your Director and see how it all looks.



I can't guarantee this will work, but there's *clearly* something screwy in your DB at the tablespace level or lower, and I can't tell without poking around in your database myself what it is. It's possible there was a corrupted sector where the index is stored. Yeah, I know, RAID, checksums, not supposed to happen; there is a thing that storage admins lose sleep over called the RAID5 write hole. It's a way that silent data corruption can happen on a RAID5 array.


(Speaking of which, if you have the ability to do so, if I were you I'd try to provision a separate RAID10, *NOT* RAID5, volume for your database storage. When it comes to large database operations, I/O throughput is next to godliness.)



--
  Phil Stracchino
  Babylon Communications
  ph...@caerllewys.net
  p...@co.ordinate.org
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958



_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to