RAVINARAYAN SINGH created NIFI-11858:
----------------------------------------
Summary: Improve column name normalization in PutDatabaseRecord
processor
Key: NIFI-11858
URL: https://issues.apache.org/jira/browse/NIFI-11858
Project: Apache NiFi
Issue Type: Improvement
Reporter: RAVINARAYAN SINGH
Assignee: RAVINARAYAN SINGH
**Current Behavior:**
The current behavior of the PutDatabaseRecord processor when
`column_translation` is set to true involves removing underscores ("_") from
column names and replacing them with an empty string. This results in column
names like "Pic_1_11" and "Pic_11_1" being considered the same, which may not
be desired, especially in databases that allow underscores as valid characters
in column names.
**Proposed Improvements:**
To address this issue and provide users with more control over the
normalization process, we propose the following improvements:
1. Allow users to specify their own regex expression: Instead of hard-coding
the normalization behavior, we can enhance the function by allowing users to
pass a custom regex expression as the `column_translation` parameter. This way,
advanced users can define their specific normalization rules based on their
database requirements.
2. Predefined normalization options: To simplify the process for users who
don't want to create their own regex expressions, we can provide some
well-defined translation options, such as:
a. REMOVE_UNDERSCORE: This option will remove all underscores from the
column names.
b. REMOVE_ALL_SPECIAL_CHAR: This option will remove all special characters
(non-alphanumeric and non-space characters) from the column names.
c. REMOVE_SPACE: This option will remove all spaces from the column names.
**Expected Behavior:**
With these improvements, users will have more flexibility and control over the
normalization process when using the PutDatabaseRecord processor. They can
either choose predefined normalization options or specify their custom regex
expression to suit their specific database requirements.
**Note:**
This improvement will enhance the usability and compatibility of the
PutDatabaseRecord processor with various database systems that have different
rules for column name normalization.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)