pvillard31 opened a new pull request, #10629: URL: https://github.com/apache/nifi/pull/10629
# Summary NIFI-15329 - Fix RenameRecordField to properly handle multiple records The `RenameRecordField` processor had a data loss bug when processing FlowFiles containing multiple records. Only the first record in a FlowFile was correctly renamed, while all subsequent records lost the renamed field's value (appearing as `null` or completely missing when null suppression was enabled in the writer). 1. **Shared Schema Modification**: Multiple records share the same schema object for memory efficiency. When `MapRecord.rename()` was called on the first record, it modified this shared schema (changing `name` → `newName`). When subsequent records tried to rename the same field, the schema no longer contained the original field name, causing the rename to fail silently. 2. **Nested Record Schema Propagation**: When renaming fields in nested records (e.g., `/addresses[*]/street`), the parent record's schema still referenced the old nested schema. The writer would then look for the wrong field names when serializing. 1st change: `MapRecord.rename()` - Defensive Schema Copy. This ensures each record gets its own schema copy before modification, preventing changes from affecting other records sharing the same schema. 2nd change: `MapRecord.regenerateSchema()` - Enhanced to recursively regenerate schemas for nested records within arrays. 3rd change: `RenameRecordField.process()` - Added call to propagate nested schema changes to the parent record: Updated all existing test input files to use multiple records (2 per file) to properly test the multi-record scenario. ### Performance Considerations The fix introduces some overhead, but it's optimized for typical use cases. #### Performance Impact by Scenario | Scenario | Impact | Notes | |----------|--------|-------| | Single record per FlowFile | Negligible | One schema copy, one `regenerateSchema()` call | | Multiple records, top-level renames | Low | Schema copy once per record; `regenerateSchema()` iterates fields only | | Multiple records, nested renames | Low-Moderate | Same as above, plus nested record iteration | | Large arrays of records (1000+ elements) | Moderate | `regenerateSchema()` iterates all array elements | | Deeply nested structures (3+ levels) | Moderate-High | Recursive processing multiplies cost | | Very wide schemas (100+ fields) | Low-Moderate | Larger schema copies | #### Optimizations Applied - **`mutableSchema` flag**: Schema is copied only once per record, not once per rename operation - **Lazy evaluation**: Schema regeneration happens only after all renames are complete #### When Performance Is Not a Concern - Typical record structures (10-50 fields) - Moderate array sizes (< 1000 elements) - 1-2 levels of nesting These represent the vast majority of real-world use cases. # Tracking Please complete the following tracking steps prior to pull request creation. ### Issue Tracking - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created ### Pull Request Tracking - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-00000` - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-00000` ### Pull Request Formatting - [ ] Pull Request based on current revision of the `main` branch - [ ] Pull Request refers to a feature branch with one commit containing changes # Verification Please indicate the verification steps performed prior to pull request creation. ### Build - [ ] Build completed using `./mvnw clean install -P contrib-check` - [ ] JDK 21 - [ ] JDK 25 ### Licensing - [ ] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html) - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` files ### Documentation - [ ] Documentation formatting appears as expected in rendered files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
