[ 
https://issues.apache.org/jira/browse/NIFI-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann reassigned NIFI-4550:
--------------------------------------

    Assignee: endzeit

> Add an InferCharacterSet processor
> ----------------------------------
>
>                 Key: NIFI-4550
>                 URL: https://issues.apache.org/jira/browse/NIFI-4550
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: endzeit
>            Priority: Minor
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Sometimes in a NiFi flow it is not known what character set an incoming flow 
> file is using. This can make it difficult for downstream processing if the 
> processors expect a particular charset (whether the user can configure it or 
> not). There is a ConvertCharacterSet processor, but it expects an explicit 
> value for Input Character Set, when this might not be known.
> I propose an InferCharacterSet processor, which would presumably use some 
> license-friendly third-party library (there is a discussion 
> [here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream])
>  to guess the character set, perhaps adding it as an attribute for use 
> downstream in ConvertCharacterSet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to