fhueske commented on a change in pull request #6823: [FLINK-10134] UTF-16
support for TextInputFormat bug refixed
URL: https://github.com/apache/flink/pull/6823#discussion_r225553773
##########
File path:
flink-core/src/main/java/org/apache/flink/api/common/io/DelimitedInputFormat.java
##########
@@ -62,26 +64,41 @@
// Charset is not serializable
private transient Charset charset;
+ /**
+ * The charset of bom in the file to process.
+ */
+ private transient Charset bomCharset;
+
+ /**
+ * The Map to record the BOM encoding of all files.
+ */
+ private transient final Map<String, Charset> fileBomCharsetMap;
Review comment:
I would bound the size of the map to something like 1024 entries. Once the
map exceeds the size, we should start removing entries.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services