[ https://issues.apache.org/jira/browse/HDDS-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838533#comment-17838533 ]
Tanvi Penumudy edited comment on HDDS-10571 at 4/18/24 8:43 AM: ---------------------------------------------------------------- After having (re)set the below environment settings on a test cluster, I was able to get both the *sh* and *fs* commands to work with Chinese character filenames: {code:java} export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8" localedef -i en_US -f UTF-8 en_US.UTF-8 localedef -i en_US -f UTF-8 en_US.UTF-8 export LANG=en_US.UTF-8 export LC_CTYPE="zh_CN.UTF-8" export LC_MESSAGES="zh_CN.UTF-8" export LC_ALL="zh_CN.UTF-8" locale LANG=en_US.UTF-8 LC_CTYPE="zh_CN.UTF-8" LC_NUMERIC="zh_CN.UTF-8" LC_TIME="zh_CN.UTF-8" LC_COLLATE="zh_CN.UTF-8" LC_MONETARY="zh_CN.UTF-8" LC_MESSAGES="zh_CN.UTF-8" LC_PAPER="zh_CN.UTF-8" LC_NAME="zh_CN.UTF-8" LC_ADDRESS="zh_CN.UTF-8" LC_TELEPHONE="zh_CN.UTF-8" LC_MEASUREMENT="zh_CN.UTF-8" LC_IDENTIFICATION="zh_CN.UTF-8" LC_ALL=zh_CN.UTF-8 locale charmap UTF-8 {code} The above settings ensure that our system has locale files available for both English (United States) and Chinese (Simplified, China) locales with UTF-8 encoding needed for displaying text in these languages (in the above case). After having (re)set the above properties, the sh and fs commands work fine - if not already (but not limited to). *[sh commands] OBS Bucket:* Key List: {code:java} ozone sh key list /s3v/chinese-bucket Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 [ { "volumeName" : "s3v", "bucketName" : "chinese-bucket", "name" : "客客客.pdf", "dataSize" : 28, "creationTime" : "2024-03-27T04:57:36.879Z", "modificationTime" : "2024-03-27T04:57:38.061Z", "replicationConfig" : { "replicationFactor" : "THREE", "requiredNodes" : 3, "replicationType" : "RATIS" }, "metadata" : { }, "file" : true } ] {code} Key Delete: {code:java} ozone sh key delete /s3v/chinese-bucket/客客客.pdf Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 ozone sh key list /s3v/chinese-bucket Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 [ ] {code} *[fs commands] FSO Bucket:* Listing: {code:java} ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/ Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -rw-rw-rw- 3 systest systest 28 2024-03-27 05:06 ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf {code} Deletion: {code:java} ozone fs -rm ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 24/03/27 05:19:58 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 24/03/27 05:19:58 INFO om.TrashPolicyOzone: Moved: 'ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf' to trash at: ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/ Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 drwxrwxrwx - systest systest 0 2024-03-27 05:19 ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash drwxrwxrwx - systest systest 0 2024-03-27 05:19 ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest drwxrwxrwx - systest systest 0 2024-03-27 05:19 ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current -rw-rw-rw- 3 systest systest 28 2024-03-27 05:06 ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf {code} Resolving the ticket, please reopen the ticket as needed if a similar issue is encountered! was (Author: JIRAUSER285056): After having (re)set the below environment settings on a test cluster, I was able to get both the *sh* and *fs* commands to work with Chinese character filenames: {code:java} export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8" localedef -i en_US -f UTF-8 en_US.UTF-8 localedef -i en_US -f UTF-8 en_US.UTF-8 export LANG=en_US.UTF-8 export LC_CTYPE="zh_CN.UTF-8" export LC_MESSAGES="zh_CN.UTF-8" export LC_ALL="zh_CN.UTF-8" locale LANG=en_US.UTF-8 LC_CTYPE="zh_CN.UTF-8" LC_NUMERIC="zh_CN.UTF-8" LC_TIME="zh_CN.UTF-8" LC_COLLATE="zh_CN.UTF-8" LC_MONETARY="zh_CN.UTF-8" LC_MESSAGES="zh_CN.UTF-8" LC_PAPER="zh_CN.UTF-8" LC_NAME="zh_CN.UTF-8" LC_ADDRESS="zh_CN.UTF-8" LC_TELEPHONE="zh_CN.UTF-8" LC_MEASUREMENT="zh_CN.UTF-8" LC_IDENTIFICATION="zh_CN.UTF-8" LC_ALL=zh_CN.UTF-8 locale charmap UTF-8 {code} The above settings ensure that our system has locale files available for both English (United States) and Chinese (Simplified, China) locales with UTF-8 encoding needed for displaying text in these languages (in the above case). After having (re)set the above properties, the sh and fs commands work fine. *[sh commands] OBS Bucket:* Key List: {code:java} ozone sh key list /s3v/chinese-bucket Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 [ { "volumeName" : "s3v", "bucketName" : "chinese-bucket", "name" : "客客客.pdf", "dataSize" : 28, "creationTime" : "2024-03-27T04:57:36.879Z", "modificationTime" : "2024-03-27T04:57:38.061Z", "replicationConfig" : { "replicationFactor" : "THREE", "requiredNodes" : 3, "replicationType" : "RATIS" }, "metadata" : { }, "file" : true } ] {code} Key Delete: {code:java} ozone sh key delete /s3v/chinese-bucket/客客客.pdf Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 ozone sh key list /s3v/chinese-bucket Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 [ ] {code} *[fs commands] FSO Bucket:* Listing: {code:java} ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/ Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -rw-rw-rw- 3 systest systest 28 2024-03-27 05:06 ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf {code} Deletion: {code:java} ozone fs -rm ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 24/03/27 05:19:58 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 24/03/27 05:19:58 INFO om.TrashPolicyOzone: Moved: 'ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf' to trash at: ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/ Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 drwxrwxrwx - systest systest 0 2024-03-27 05:19 ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash drwxrwxrwx - systest systest 0 2024-03-27 05:19 ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest drwxrwxrwx - systest systest 0 2024-03-27 05:19 ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current -rw-rw-rw- 3 systest systest 28 2024-03-27 05:06 ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf {code} Resolving the ticket, please reopen the ticket as needed if a similar issue is encountered! > [Ozone-s3api] Chinese characters written via s3api is breaking sh and fs > operations > ----------------------------------------------------------------------------------- > > Key: HDDS-10571 > URL: https://issues.apache.org/jira/browse/HDDS-10571 > Project: Apache Ozone > Issue Type: Bug > Components: S3, s3gateway > Affects Versions: 1.5.0 > Reporter: Soumitra Sulav > Assignee: Tanvi Penumudy > Priority: Critical > > Read/write chinese characters works well with S3 Gateway but if we do create > such keys with Chinese characters in the name, further interoperable cli like > sh and fs breaks. > Tested with both OBS and FSO bucket layout. Delete / List fail with error on > such bucket or path. > {code:java} > Malformed input or input contains unmappable characters:{code} > Error in detail : > {code:java} > [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# ozone sh > bucket info /s3v/chinese-bucket > { > "metadata" : { }, > "volumeName" : "s3v", > "name" : "chinese-bucket", > "storageType" : "DISK", > "versioning" : false, > "usedBytes" : 18000000000, > "usedNamespace" : 1, > "creationTime" : "2024-03-21T10:19:54.719Z", > "modificationTime" : "2024-03-21T10:19:54.719Z", > "sourcePathExist" : true, > "quotaInBytes" : -1, > "quotaInNamespace" : -1, > "bucketLayout" : "OBJECT_STORE", > "owner" : "om", > "link" : false > } > [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# ozone sh key > list /s3v/chinese-bucket > Malformed input or input contains unmappable characters: ?? > [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# aws s3api > --endpoint https://ccycloud-4.quasar-jqpbgq.root.comops.site:9879/ > list-objects --bucket chinese-bucket --no-verify-ssl > /usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py:1061: > InsecureRequestWarning: Unverified HTTPS request is being made to host > 'ccycloud-4.quasar-jqpbgq.root.comops.site'. Adding certificate verification > is strongly advised. See: > https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings > warnings.warn( > { > "Contents": [ > { > "Key": "客客", > "LastModified": "2024-03-21T10:31:10.512Z", > "ETag": "2024-03-21T10:31:10.512Z", > "Size": 6000000000, > "StorageClass": "STANDARD" > } > ], > "RequestCharged": null > } > [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# ozone sh key > delete /s3v/chinese-bucket/客客 > KEY_NOT_FOUND Key not found[r...@ccycloud-1.quasar-jqpbgq.root.comops.site > multi_upload]# ozone sh bucket info /linkvol1710602076/gkpznymu > { > "metadata" : { }, > "volumeName" : "linkvol1710602076", > "name" : "gkpznymu", > "storageType" : "DISK", > "versioning" : false, > "usedBytes" : 36094371864, > "usedNamespace" : 7, > "creationTime" : "2024-03-21T08:01:57.656Z", > "modificationTime" : "2024-03-21T09:03:41.356Z", > "sourcePathExist" : true, > "quotaInBytes" : -1, > "quotaInNamespace" : -1, > "bucketLayout" : "FILE_SYSTEM_OPTIMIZED", > "owner" : "om", > "link" : false > } > [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# ozone fs -ls > -R ofs://ozone1710999924/linkvol1710602076/gkpznymu > -ls: Malformed input or input contains unmappable characters: ? > Usage: ozone fs [generic options] {code} > > Steps to reproduce : > {code:java} > aws s3 cp 客客 s3://chi --endpoint > https://ccycloud-4.quasar-jqpbgq.root.comops.site:9879/ --no-verify-ssl > # To validate > aws s3api list-objects --bucket chi --endpoint > https://ccycloud-4.quasar-jqpbgq.root.comops.site:9879/ --no-verify-ssl > ozone sh key list /s3v/chi > ## Malformed input or input contains unmappable characters: ?? > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org