[
https://issues.apache.org/jira/browse/HDDS-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838533#comment-17838533
]
Tanvi Penumudy commented on HDDS-10571:
---------------------------------------
After having (re)set the below environment settings on a test cluster, I was
able to get both the *sh* and *fs* commands to work with Chinese character
filenames:
{code:java}
export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"
localedef -i en_US -f UTF-8 en_US.UTF-8
localedef -i en_US -f UTF-8 en_US.UTF-8
export LANG=en_US.UTF-8
export LC_CTYPE="zh_CN.UTF-8"
export LC_MESSAGES="zh_CN.UTF-8"
export LC_ALL="zh_CN.UTF-8"
locale
LANG=en_US.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=zh_CN.UTF-8
locale charmap
UTF-8
{code}
The above settings ensure that our system has locale files available for both
English (United States) and Chinese (Simplified, China) locales with UTF-8
encoding needed for displaying text in these languages (in the above case).
After having (re)set the above properties, the sh and fs commands work fine.
*[sh commands] OBS Bucket:*
Key List:
{code:java}
ozone sh key list /s3v/chinese-bucket
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
[ {
"volumeName" : "s3v",
"bucketName" : "chinese-bucket",
"name" : "客客客.pdf",
"dataSize" : 28,
"creationTime" : "2024-03-27T04:57:36.879Z",
"modificationTime" : "2024-03-27T04:57:38.061Z",
"replicationConfig" : {
"replicationFactor" : "THREE",
"requiredNodes" : 3,
"replicationType" : "RATIS"
},
"metadata" : { },
"file" : true
} ]
{code}
Key Delete:
{code:java}
ozone sh key delete /s3v/chinese-bucket/客客客.pdf
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
ozone sh key list /s3v/chinese-bucket
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
[ ]
{code}
*[fs commands] FSO Bucket:*
Listing:
{code:java}
ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
-rw-rw-rw- 3 systest systest 28 2024-03-27 05:06
ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf
{code}
Deletion:
{code:java}
ozone fs -rm ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
24/03/27 05:19:58 INFO Configuration.deprecation: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum
24/03/27 05:19:58 INFO om.TrashPolicyOzone: Moved:
'ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf' to trash at:
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf
ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
drwxrwxrwx - systest systest 0 2024-03-27 05:19
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash
drwxrwxrwx - systest systest 0 2024-03-27 05:19
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest
drwxrwxrwx - systest systest 0 2024-03-27 05:19
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current
-rw-rw-rw- 3 systest systest 28 2024-03-27 05:06
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf
{code}
Resolving the ticket, please reopen the ticket as needed if a similar issue is
encountered!
> [Ozone-s3api] Chinese characters written via s3api is breaking sh and fs
> operations
> -----------------------------------------------------------------------------------
>
> Key: HDDS-10571
> URL: https://issues.apache.org/jira/browse/HDDS-10571
> Project: Apache Ozone
> Issue Type: Bug
> Components: S3, s3gateway
> Affects Versions: 1.5.0
> Reporter: Soumitra Sulav
> Assignee: Tanvi Penumudy
> Priority: Critical
>
> Read/write chinese characters works well with S3 Gateway but if we do create
> such keys with Chinese characters in the name, further interoperable cli like
> sh and fs breaks.
> Tested with both OBS and FSO bucket layout. Delete / List fail with error on
> such bucket or path.
> {code:java}
> Malformed input or input contains unmappable characters:{code}
> Error in detail :
> {code:java}
> [[email protected] multi_upload]# ozone sh
> bucket info /s3v/chinese-bucket
> {
> "metadata" : { },
> "volumeName" : "s3v",
> "name" : "chinese-bucket",
> "storageType" : "DISK",
> "versioning" : false,
> "usedBytes" : 18000000000,
> "usedNamespace" : 1,
> "creationTime" : "2024-03-21T10:19:54.719Z",
> "modificationTime" : "2024-03-21T10:19:54.719Z",
> "sourcePathExist" : true,
> "quotaInBytes" : -1,
> "quotaInNamespace" : -1,
> "bucketLayout" : "OBJECT_STORE",
> "owner" : "om",
> "link" : false
> }
> [[email protected] multi_upload]# ozone sh key
> list /s3v/chinese-bucket
> Malformed input or input contains unmappable characters: ??
> [[email protected] multi_upload]# aws s3api
> --endpoint https://ccycloud-4.quasar-jqpbgq.root.comops.site:9879/
> list-objects --bucket chinese-bucket --no-verify-ssl
> /usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py:1061:
> InsecureRequestWarning: Unverified HTTPS request is being made to host
> 'ccycloud-4.quasar-jqpbgq.root.comops.site'. Adding certificate verification
> is strongly advised. See:
> https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
> warnings.warn(
> {
> "Contents": [
> {
> "Key": "客客",
> "LastModified": "2024-03-21T10:31:10.512Z",
> "ETag": "2024-03-21T10:31:10.512Z",
> "Size": 6000000000,
> "StorageClass": "STANDARD"
> }
> ],
> "RequestCharged": null
> }
> [[email protected] multi_upload]# ozone sh key
> delete /s3v/chinese-bucket/客客
> KEY_NOT_FOUND Key not found[[email protected]
> multi_upload]# ozone sh bucket info /linkvol1710602076/gkpznymu
> {
> "metadata" : { },
> "volumeName" : "linkvol1710602076",
> "name" : "gkpznymu",
> "storageType" : "DISK",
> "versioning" : false,
> "usedBytes" : 36094371864,
> "usedNamespace" : 7,
> "creationTime" : "2024-03-21T08:01:57.656Z",
> "modificationTime" : "2024-03-21T09:03:41.356Z",
> "sourcePathExist" : true,
> "quotaInBytes" : -1,
> "quotaInNamespace" : -1,
> "bucketLayout" : "FILE_SYSTEM_OPTIMIZED",
> "owner" : "om",
> "link" : false
> }
> [[email protected] multi_upload]# ozone fs -ls
> -R ofs://ozone1710999924/linkvol1710602076/gkpznymu
> -ls: Malformed input or input contains unmappable characters: ?
> Usage: ozone fs [generic options] {code}
>
> Steps to reproduce :
> {code:java}
> aws s3 cp 客客 s3://chi --endpoint
> https://ccycloud-4.quasar-jqpbgq.root.comops.site:9879/ --no-verify-ssl
> # To validate
> aws s3api list-objects --bucket chi --endpoint
> https://ccycloud-4.quasar-jqpbgq.root.comops.site:9879/ --no-verify-ssl
> ozone sh key list /s3v/chi
> ## Malformed input or input contains unmappable characters: ??
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]