[ 
https://issues.apache.org/jira/browse/NIFI-11858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-11858:
--------------------------------
    Status: Patch Available  (was: Open)

> Improve column name normalization in PutDatabaseRecord processor
> ----------------------------------------------------------------
>
>                 Key: NIFI-11858
>                 URL: https://issues.apache.org/jira/browse/NIFI-11858
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: RAVINARAYAN SINGH
>            Assignee: RAVINARAYAN SINGH
>            Priority: Minor
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> *Current Behavior:*
> The current behavior of the PutDatabaseRecord processor when 
> `column_translation` is set to true involves removing underscores ("_") from 
> column names and replacing them with an empty string. This results in column 
> names like "Pic_1_11" and "Pic_11_1" being considered the same, which may not 
> be desired, especially in databases that allow underscores as valid 
> characters in column names.
> *Proposed Improvements:*
> To address this issue and provide users with more control over the 
> normalization process, we propose the following improvements:
> 1. Allow users to specify their own regex expression: Instead of hard-coding 
> the normalization behavior, we can enhance the function by allowing users to 
> pass a custom regex expression as the `column_translation` parameter. This 
> way, advanced users can define their specific normalization rules based on 
> their database requirements.
> 2. Predefined normalization options: To simplify the process for users who 
> don't want to create their own regex expressions, we can provide some 
> well-defined translation options, such as:
>    a. REMOVE_UNDERSCORE: This option will remove all underscores from the 
> column names.
>    b. REMOVE_ALL_SPECIAL_CHAR: This option will remove all special characters 
> (non-alphanumeric and non-space characters) from the column names.
>    c. REMOVE_SPACE: This option will remove all spaces from the column names.
> *Expected Behavior:*
> With these improvements, users will have more flexibility and control over 
> the normalization process when using the PutDatabaseRecord processor. They 
> can either choose predefined normalization options or specify their custom 
> regex expression to suit their specific database requirements.
> *Note:*
> This improvement will enhance the usability and compatibility of the 
> PutDatabaseRecord processor with various database systems that have different 
> rules for column name normalization.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to