XuQianJin-Stars commented on a change in pull request #6710: [FLINK-10134]
UTF-16 support for TextInputFormat bug fixed
URL: https://github.com/apache/flink/pull/6710#discussion_r223929236
##########
File path:
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java
##########
@@ -601,41 +602,44 @@ public LocatableInputSplitAssigner
getInputSplitAssigner(FileInputSplit[] splits
if (unsplittable) {
int splitNum = 0;
for (final FileStatus file : files) {
+ String bomCharsetName = getBomCharset(file);
Review comment:
Whether to judge the encoding type of a file or byte stream (no BOM), there
may be the following scenario:
The file type is encoded as UTF-8, but the user-specified encoding type is
UTF-16 or UTF-32, which still causes incorrect garbled parsing. On the other
hand, it is a nuisance to determine what type of encoding a file or byte stream
(no BOM) is.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services