[ https://issues.apache.org/jira/browse/NIFI-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238451#comment-16238451 ]
Michael Moser commented on NIFI-4550: ------------------------------------- Perhaps somewhat related to NIFI-1874? > Add an InferCharacterSet processor > ---------------------------------- > > Key: NIFI-4550 > URL: https://issues.apache.org/jira/browse/NIFI-4550 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions > Reporter: Matt Burgess > Priority: Minor > > Sometimes in a NiFi flow it is not known what character set an incoming flow > file is using. This can make it difficult for downstream processing if the > processors expect a particular charset (whether the user can configure it or > not). There is a ConvertCharacterSet processor, but it expects an explicit > value for Input Character Set, when this might not be known. > I propose an InferCharacterSet processor, which would presumably use some > license-friendly third-party library (there is a discussion > [here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream]) > to guess the character set, perhaps adding it as an attribute for use > downstream in ConvertCharacterSet. -- This message was sent by Atlassian JIRA (v6.4.14#64029)