pvillard31 opened a new pull request, #10629:
URL: https://github.com/apache/nifi/pull/10629

   # Summary
   
   NIFI-15329 - Fix RenameRecordField to properly handle multiple records
   
   The `RenameRecordField` processor had a data loss bug when processing 
FlowFiles containing multiple records. Only the first record in a FlowFile was 
correctly renamed, while all subsequent records lost the renamed field's value 
(appearing as `null` or completely missing when null suppression was enabled in 
the writer).
   
   1. **Shared Schema Modification**: Multiple records share the same schema 
object for memory efficiency. When `MapRecord.rename()` was called on the first 
record, it modified this shared schema (changing `name` → `newName`). When 
subsequent records tried to rename the same field, the schema no longer 
contained the original field name, causing the rename to fail silently.
   
   2. **Nested Record Schema Propagation**: When renaming fields in nested 
records (e.g., `/addresses[*]/street`), the parent record's schema still 
referenced the old nested schema. The writer would then look for the wrong 
field names when serializing.
   
   1st change: `MapRecord.rename()` - Defensive Schema Copy. This ensures each 
record gets its own schema copy before modification, preventing changes from 
affecting other records sharing the same schema.
   
   2nd change: `MapRecord.regenerateSchema()` - Enhanced to recursively 
regenerate schemas for nested records within arrays.
   
   3rd change: `RenameRecordField.process()` - Added call to propagate nested 
schema changes to the parent record:
   
   Updated all existing test input files to use multiple records (2 per file) 
to properly test the multi-record scenario.
   
   ### Performance Considerations
   
   The fix introduces some overhead, but it's optimized for typical use cases.
   
   #### Performance Impact by Scenario
   
   | Scenario | Impact | Notes |
   |----------|--------|-------|
   | Single record per FlowFile | Negligible | One schema copy, one 
`regenerateSchema()` call |
   | Multiple records, top-level renames | Low | Schema copy once per record; 
`regenerateSchema()` iterates fields only |
   | Multiple records, nested renames | Low-Moderate | Same as above, plus 
nested record iteration |
   | Large arrays of records (1000+ elements) | Moderate | `regenerateSchema()` 
iterates all array elements |
   | Deeply nested structures (3+ levels) | Moderate-High | Recursive 
processing multiplies cost |
   | Very wide schemas (100+ fields) | Low-Moderate | Larger schema copies |
   
   #### Optimizations Applied
   
   - **`mutableSchema` flag**: Schema is copied only once per record, not once 
per rename operation
   - **Lazy evaluation**: Schema regeneration happens only after all renames 
are complete
   
   #### When Performance Is Not a Concern
   
   - Typical record structures (10-50 fields)
   - Moderate array sizes (< 1000 elements)  
   - 1-2 levels of nesting
   
   These represent the vast majority of real-world use cases.
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-00000`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-00000`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `./mvnw clean install -P contrib-check`
     - [ ] JDK 21
     - [ ] JDK 25
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to