[jira] [Commented] (NIFI-11858) Improve column name normalization in PutDatabaseRecord processor

ASF subversion and git services (Jira) Fri, 13 Dec 2024 11:51:57 -0800


    [ 
https://issues.apache.org/jira/browse/NIFI-11858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905582#comment-17905582
 ]


ASF subversion and git services commented on NIFI-11858:
--------------------------------------------------------

Commit 502572b2f5911685a5baf8ce70a4c8f5f90b668b in nifi's branch 
refs/heads/main from ravisingh
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=502572b2f5 ]

NIFI-11858 Configurable Column Name Normalization in PutDatabaseRecord and 
UpdateDatabaseTable

cleaned and required changes  for https://github.com/apache/nifi/pull/8995

updated the description to reflect uppercase conversion of column name  
uppercased to do case-insensitive matching irrespective of strategy

added example for REMOVE_ALL_SPECIAL_CHAR  and PATTERN

Signed-off-by: Matt Burgess <[email protected]>

This closes #9382


> Improve column name normalization in PutDatabaseRecord processor
> ----------------------------------------------------------------
>
>                 Key: NIFI-11858
>                 URL: https://issues.apache.org/jira/browse/NIFI-11858
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: RAVINARAYAN SINGH
>            Assignee: RAVINARAYAN SINGH
>            Priority: Minor
>          Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> *Current Behavior:*
> The current behavior of the PutDatabaseRecord processor when 
> `column_translation` is set to true involves removing underscores ("_") from 
> column names and replacing them with an empty string. This results in column 
> names like "Pic_1_11" and "Pic_11_1" being considered the same, which may not 
> be desired, especially in databases that allow underscores as valid 
> characters in column names.
> *Proposed Improvements:*
> To address this issue and provide users with more control over the 
> normalization process, we propose the following improvements:
> 1. Allow users to specify their own regex expression: Instead of hard-coding 
> the normalization behavior, we can enhance the function by allowing users to 
> pass a custom regex expression as the `column_translation` parameter. This 
> way, advanced users can define their specific normalization rules based on 
> their database requirements.
> 2. Predefined normalization options: To simplify the process for users who 
> don't want to create their own regex expressions, we can provide some 
> well-defined translation options, such as:
>    a. REMOVE_UNDERSCORE: This option will remove all underscores from the 
> column names.
>    b. REMOVE_ALL_SPECIAL_CHAR: This option will remove all special characters 
> (non-alphanumeric and non-space characters) from the column names.
>    c. REMOVE_SPACE: This option will remove all spaces from the column names.
> *Expected Behavior:*
> With these improvements, users will have more flexibility and control over 
> the normalization process when using the PutDatabaseRecord processor. They 
> can either choose predefined normalization options or specify their custom 
> regex expression to suit their specific database requirements.
> *Note:*
> This improvement will enhance the usability and compatibility of the 
> PutDatabaseRecord processor with various database systems that have different 
> rules for column name normalization.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-11858) Improve column name normalization in PutDatabaseRecord processor

Reply via email to