[ 
https://issues.apache.org/jira/browse/FLINK-36627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894945#comment-17894945
 ] 

Baiqing Lyu commented on FLINK-36627:
-------------------------------------

I took a brief look at this problem at it seems like the current 
_CsvReaderFormat_ class does not expose a way for users to specify a character 
encoding set.
One potential solution would be the addition of new _forPojo_ and _forSchema_ 
builders to accept a new charset option, probability the object 
_org.apache.commons.io.Charsets_ would work here.

Finally, the question for existing members would be is this a necessary 
addition? Or is this something not expected to be supported.

I'm new to the contributing guide, after reviewing the code contribution 
process I figure commenting here is appropriate, let me know if I should be 
using the mailing list or any other methods.

> Failure to process a CSV file in Flink due to a character encoding mismatch: 
> the file is in ISO-8859 and the application expects UTF-8.
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-36627
>                 URL: https://issues.apache.org/jira/browse/FLINK-36627
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Hector Miuler Malpica Gallegos
>            Priority: Major
>
> I have error in read csv with charset ISO-8859, my error is the following:
> {{{color:#de350b}_Caused by: java.io.CharConversionException: Invalid UTF-8 
> middle byte 0x41 (at char #1247, byte #1246): check content encoding, does 
> not look like UTF-8_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportInvalidOther(UTF8Reader.java:520)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportDeferredInvalid(UTF8Reader.java:531)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.read(UTF8Reader.java:177)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.loadMore(CsvDecoder.java:458)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder._nextUnquotedString(CsvDecoder.java:782)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.nextString(CsvDecoder.java:732)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:963)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvParser.nextFieldName(CsvParser.java:763)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:321)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:283)_{color}}}
> {{{color:#de350b}    _at 
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:199)_{color}}}
> {{{color:#de350b}    _... 11 more_{color}}}
>  
>  
> {{My code is the following:}}
> {color:#0747a6}_{{{}val env = 
> StreamExecutionEnvironment.createLocalEnvironment(){}}}{{{}val csvFormat = 
> CsvReaderFormat.forPojo(Empresa::class.java){}}}_{color}
> {color:#0747a6}_{{val csvSource = FileSource}}_{color}
> {color:#0747a6}_{{.forRecordStreamFormat(csvFormat, 
> Path("/miuler/PadronRUC_202410.csv"))}}_{color}
> {color:#0747a6}_{{.build()}}_{color}
> {color:#0747a6}_{{val empresaStreamSource = env.fromSource(csvSource, 
> WatermarkStrategy.noWatermarks(), "CSV Source")}}_{color}
> {color:#0747a6}_{{empresaStreamSource.print()}}_{color}
> {color:#0747a6}_{{env.execute("Load CSV")}}_{color}
>  
>  
> My dependencies:
> _{color:#0747a6}{{val kotlinVersion = "1.20.0"}}{color}_
> _{color:#0747a6}{{dependencies {}}{color}_
>  
> _{color:#0747a6}{{implementation("org.apache.flink:flink-shaded-jackson:2.15.3-19.0")}}{color}_
>  
> _{color:#0747a6}{{implementation("org.apache.flink:flink-core:$kotlinVersion")}}{color}_
>  
> _{color:#0747a6}{{implementation("org.apache.flink:flink-runtime:$kotlinVersion")}}{color}_
>  
> _{color:#0747a6}{{implementation("org.apache.flink:flink-runtime-web:$kotlinVersion")}}{color}_
>  
> _{color:#0747a6}{{implementation("org.apache.flink:flink-clients:$kotlinVersion")}}{color}_
>  
> _{color:#0747a6}{{implementation("org.apache.flink:flink-streaming-java:$kotlinVersion")}}{color}_
>  
> _{color:#0747a6}{{implementation("org.apache.flink:flink-csv:$kotlinVersion")}}{color}_
>  
> _{color:#0747a6}{{implementation("org.apache.flink:flink-connector-base:$kotlinVersion")}}{color}_
>  
> _{color:#0747a6}{{implementation("org.apache.flink:flink-connector-files:$kotlinVersion")}}{color}_
> _{color:#0747a6}}{color}_
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to