[
https://issues.apache.org/jira/browse/HDFS-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906557#comment-13906557
]
Jiqiu commented on HDFS-4678:
-----------------------------
JNI specification doc says:
"There are two differences between this format and the "standard" UTF-8
format. First, the null byte (byte)0 is encoded using the two-byte
format rather than the one-byte format. This means that Java VM UTF-8
strings never have embedded nulls. Second, only the one-byte, two-byte,
and three-byte formats are used. The Java VM does not recognize the
longer UTF-8 formats."
that's why some Japanese character cannot be translated. like 𠀋 which is 4
bytes,\xF0\xA0\x80\x8B
> libhdfs casts Japanese character incorrectly to Java API
> ----------------------------------------------------------
>
> Key: HDFS-4678
> URL: https://issues.apache.org/jira/browse/HDFS-4678
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: libhdfs
> Affects Versions: 1.1.2
> Environment: Platform: Linux64
> Locale: Japanese (ja_JP.UTF-8)
> Reporter: Jiqiu
> Priority: Minor
>
> put a local file with Japanese characters to hdfs,
> while browsing it in hdfs, it cannot be recognized.
> here is the test.c
> #include "hdfs.h"
> #include <stdio.h>
> #include <locale.h>
> int main(int argc, char **argv) {
> if(!setlocale(LC_CTYPE, "ja_JP")) {
> printf("Can not set locale type\n");
> }
> printf("0\n");
> hdfsFS fs = hdfsConnect("localhost", 9000);
> printf("1\n");
> const char* writePath = "/tmp/\xF0\xA0\x80\x8B.txt";
> printf("2\n");
> hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0,
> 0);
> if(!writeFile) {
> fprintf(stderr, "Failed to open %s for writing!\n", writePath);
> exit(-1);
> }
> char* buffer = "Hello, World! \xF0\xA0\x80\x8B";
> tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer,
> strlen(buffer)+1);
> if (hdfsFlush(fs, writeFile)) {
> fprintf(stderr, "Failed to 'flush' %s\n", writePath);
> exit(-1);
> }
> printf("3\n");
> hdfsCloseFile(fs, writeFile);
> }
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)