Issue 178530
Summary [TableGen] Duplicate Enum Definitions in Register Class Intersections
Labels new issue
Assignees
Reporter dmpots
    ## Summary

TableGen's name-flattening optimization in register class intersection generation produces duplicate C++ enum definitions, causing compilation failures. When TableGen creates intersection register classes with nested inferred relationships, it can generate the same shortened name for two different register sets.

## Problem Description

While adding a new register class configuration, we encountered a C++ compilation error due to duplicate enum definitions:

```cpp
error: redefinition of enumerator 'R128_Aligned4_with_sub0_in_R32_RestrictedRegClassID'
  R128_Aligned4_with_sub0_in_R32_RestrictedRegClassID = 7,
  ^
note: previous definition is here
  R128_Aligned4_with_sub0_in_R32_RestrictedRegClassID = 9,
  ^
```

The same enum name appears twice in the generated code with different enum values (7 and 9), representing two distinct register sets with different members.

## Steps to Reproduce

A minimal test case has been added to demonstrate this bug: https://github.com/dmpots/llvm-project/commit/30aa1e1172c37e7493f1c0e77ea82e4a2b8a89c7
- **Command**: `llvm-lit llvm/test/TableGen/intersection-class-duplicate-enum.td`
- **TestFile**: `llvm/test/TableGen/intersection-class-duplicate-enum.td`
- **Status**: Marked XFAIL as the bug is currently present

### Manual Reproduction

```bash
# Run the test case
llvm-tblgen -gen-register-info -I llvm/include \
  llvm/test/TableGen/intersection-class-duplicate-enum.td -o - | \
  grep "R128_Aligned4_with_sub0_in_R32_RestrictedRegClassID ="
```

**Expected output (with bug):**
```
R128_Aligned4_with_sub0_in_R32_RestrictedRegClassID = 7,
R128_Aligned4_with_sub0_in_R32_RestrictedRegClassID = 9,
```

The duplicate name appears at enum IDs 7 and 9.

### Test Case Structure

The minimal reproducer uses:
- 16 32-bit registers (R0-R15)
- 96-bit and 128-bit register tuples with various spacing
- A restricted register class that excludes R0 (creating intersection differences)

Key register classes:
- `R32_Restricted`: R1-R15 (excludes R0)
- `R96_WithAligned4`: 96-bit tuples at positions 0, 4, 8, 12
- `R96_Aligned8`: 96-bit tuples at positions 0, 8 (subset of above)
- `R128_Aligned4`: 128-bit tuples at positions 0, 4, 8, 12

## Expected Behavior

TableGen should generate distinct enum names for different register sets. When the name-shortening logic is disabled, the correct distinct names are generated:

```cpp
enum {
  R128_Aligned4RegClassID = 6,
  R128_Aligned4_with_sub0_sub1_sub2_in_R96_WithAligned4_with_sub0_in_R32_RestrictedRegClassID = 7,
  R128_Aligned4_with_sub0_sub1_sub2_in_R96_Aligned8RegClassID = 8,
  R128_Aligned4_with_sub0_sub1_sub2_in_R96_Aligned8_with_sub0_in_R32_RestrictedRegClassID = 9,
  // ...
};
```

## Actual Behavior

With name-shortening enabled (current behavior), both IDs 7 and 9 get incorrectly flattened to the same name:

```cpp
enum {
  R128_Aligned4RegClassID = 6,
  R128_Aligned4_with_sub0_in_R32_RestrictedRegClassID = 7,    // First occurrence
  R128_Aligned4_with_sub0_sub1_sub2_in_R96_Aligned8RegClassID = 8,
  R128_Aligned4_with_sub0_in_R32_RestrictedRegClassID = 9,    // DUPLICATE!
  // ...
};
```

This causes C++ compilation to fail with "redefinition of enumerator" errors.

## Root Cause Analysis

We believe the bug is related to PR #134865

The issue is in TableGen's name-shortening logic in `inferMatchingSuperRegClass()`:
- **Location**: `llvm/utils/TableGen/Common/CodeGenRegisters.cpp:2528`
- **Code reference**: https://github.com/llvm/llvm-project/blob/09e59745fc7bc011e908b4e0298327de96ebffaa/llvm/utils/TableGen/Common/CodeGenRegisters.cpp#L2528

### Bug Mechanism

When processing `R128_Aligned4` with composite sub-register indices, TableGen infers intersection classes through different paths:

**Path A** (via R96_Aligned8):
- Full name: `R128_Aligned4_with_sub0_sub1_sub2_in_R96_Aligned8_with_sub0_in_R32_Restricted`
- Contains 1 register
- Flattened to: `R128_Aligned4_with_sub0_in_R32_Restricted`

**Path B** (via R96_WithAligned4):
- Full name: `R128_Aligned4_with_sub0_sub1_sub2_in_R96_WithAligned4_with_sub0_in_R32_Restricted`
- Contains 3 registers
- Flattened to: `R128_Aligned4_with_sub0_in_R32_Restricted` (SAME NAME!)

The name-flattening optimization attempts to shorten these long names but doesn't check if the shortened name already exists for a different register set (different Key with different members and/or RSI).

### Name Sensitivity

Interestingly, the bug is sensitive to the exact class names used. In our testing:
- ✅ `R96_WithAligned4` → Bug reproduces
- ❌ `R96_Aligned4` → Bug does NOT reproduce

This suggests the bug depends on TableGen's internal naming heuristics and processing order, making it somewhat fragile.

## Related

- PR #134865: (introduces the name-shortening optimization)
- Repro https://github.com/dmpots/llvm-project/commit/30aa1e1172c37e7493f1c0e77ea82e4a2b8a89c7
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to