fhueske commented on a change in pull request #6823: [FLINK-10134] UTF-16 
support for TextInputFormat bug refixed
URL: https://github.com/apache/flink/pull/6823#discussion_r225531993
 
 

 ##########
 File path: 
flink-core/src/main/java/org/apache/flink/api/common/io/DelimitedInputFormat.java
 ##########
 @@ -472,6 +498,7 @@ public void open(FileInputSplit split) throws IOException {
 
                this.offset = splitStart;
                if (this.splitStart != 0) {
+                       setBomFileCharset(split);
 
 Review comment:
   We can move the BOM configuration out of the condition. The following logic 
should be applied:
   
   1. Check if a UTF or no charset was configured (by default, we assum UTF-8). 
If a different charset is explicitly configured, skip the BOM check.
   2. seek to position 0
   3. try to fetch a BOM.
   4. if `(splitStart != 0)` seek to the beginning of the split
   
   We configure the charset depending on the BOM.
   * If we find a BOM, we configure corresponding UTF charset. 
   * If we don't find a BOM, UTF-16 and UTF-32 are converted into UTF-16BE / 
UTF-32BE (BE is the assumed default).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to