[ 
https://issues.apache.org/jira/browse/HDDS-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838533#comment-17838533
 ] 

Tanvi Penumudy edited comment on HDDS-10571 at 4/18/24 8:43 AM:
----------------------------------------------------------------

After having (re)set the below environment settings on a test cluster, I was 
able to get both the *sh* and *fs* commands to work with Chinese character 
filenames:
{code:java}
export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"

localedef -i en_US -f UTF-8 en_US.UTF-8
localedef -i en_US -f UTF-8 en_US.UTF-8

export LANG=en_US.UTF-8
export LC_CTYPE="zh_CN.UTF-8"
export LC_MESSAGES="zh_CN.UTF-8"
export LC_ALL="zh_CN.UTF-8"

locale
LANG=en_US.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=zh_CN.UTF-8

locale charmap
UTF-8
{code}
The above settings ensure that our system has locale files available for both 
English (United States) and Chinese (Simplified, China) locales with UTF-8 
encoding needed for displaying text in these languages (in the above case).

After having (re)set the above properties, the sh and fs commands work fine - 
if not already (but not limited to).

 

*[sh commands] OBS Bucket:*

Key List:
{code:java}
ozone sh key list /s3v/chinese-bucket
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
[ {
  "volumeName" : "s3v",
  "bucketName" : "chinese-bucket",
  "name" : "客客客.pdf",
  "dataSize" : 28,
  "creationTime" : "2024-03-27T04:57:36.879Z",
  "modificationTime" : "2024-03-27T04:57:38.061Z",
  "replicationConfig" : {
    "replicationFactor" : "THREE",
    "requiredNodes" : 3,
    "replicationType" : "RATIS"
  },
  "metadata" : { },
  "file" : true
} ]
{code}
 

Key Delete:
{code:java}
ozone sh key delete /s3v/chinese-bucket/客客客.pdf
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8

ozone sh key list /s3v/chinese-bucket
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
[ ]
{code}
 

*[fs commands] FSO Bucket:*

Listing:
{code:java}
ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
-rw-rw-rw-   3 systest systest         28 2024-03-27 05:06 
ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf
{code}
 

Deletion:
{code:java}
ozone fs -rm ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
24/03/27 05:19:58 INFO Configuration.deprecation: io.bytes.per.checksum is 
deprecated. Instead, use dfs.bytes-per-checksum
24/03/27 05:19:58 INFO om.TrashPolicyOzone: Moved: 
'ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf' to trash at: 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf

ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
drwxrwxrwx   - systest systest          0 2024-03-27 05:19 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash
drwxrwxrwx   - systest systest          0 2024-03-27 05:19 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest
drwxrwxrwx   - systest systest          0 2024-03-27 05:19 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current
-rw-rw-rw-   3 systest systest         28 2024-03-27 05:06 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf
{code}
 

Resolving the ticket, please reopen the ticket as needed if a similar issue is 
encountered!


was (Author: JIRAUSER285056):
After having (re)set the below environment settings on a test cluster, I was 
able to get both the *sh* and *fs* commands to work with Chinese character 
filenames:
{code:java}
export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"

localedef -i en_US -f UTF-8 en_US.UTF-8
localedef -i en_US -f UTF-8 en_US.UTF-8

export LANG=en_US.UTF-8
export LC_CTYPE="zh_CN.UTF-8"
export LC_MESSAGES="zh_CN.UTF-8"
export LC_ALL="zh_CN.UTF-8"

locale
LANG=en_US.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=zh_CN.UTF-8

locale charmap
UTF-8
{code}
The above settings ensure that our system has locale files available for both 
English (United States) and Chinese (Simplified, China) locales with UTF-8 
encoding needed for displaying text in these languages (in the above case).

After having (re)set the above properties, the sh and fs commands work fine.

 

*[sh commands] OBS Bucket:*

Key List:
{code:java}
ozone sh key list /s3v/chinese-bucket
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
[ {
  "volumeName" : "s3v",
  "bucketName" : "chinese-bucket",
  "name" : "客客客.pdf",
  "dataSize" : 28,
  "creationTime" : "2024-03-27T04:57:36.879Z",
  "modificationTime" : "2024-03-27T04:57:38.061Z",
  "replicationConfig" : {
    "replicationFactor" : "THREE",
    "requiredNodes" : 3,
    "replicationType" : "RATIS"
  },
  "metadata" : { },
  "file" : true
} ]
{code}
 

Key Delete:
{code:java}
ozone sh key delete /s3v/chinese-bucket/客客客.pdf
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8

ozone sh key list /s3v/chinese-bucket
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
[ ]
{code}
 

*[fs commands] FSO Bucket:*

Listing:
{code:java}
ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
-rw-rw-rw-   3 systest systest         28 2024-03-27 05:06 
ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf
{code}
 

Deletion:
{code:java}
ozone fs -rm ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
24/03/27 05:19:58 INFO Configuration.deprecation: io.bytes.per.checksum is 
deprecated. Instead, use dfs.bytes-per-checksum
24/03/27 05:19:58 INFO om.TrashPolicyOzone: Moved: 
'ofs://ozone1711107600/s3v/chinese-bucket-fso/客客客.pdf' to trash at: 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf

ozone fs -ls -R ofs://ozone1711107600/s3v/chinese-bucket-fso/
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
drwxrwxrwx   - systest systest          0 2024-03-27 05:19 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash
drwxrwxrwx   - systest systest          0 2024-03-27 05:19 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest
drwxrwxrwx   - systest systest          0 2024-03-27 05:19 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current
-rw-rw-rw-   3 systest systest         28 2024-03-27 05:06 
ofs://ozone1711107600/s3v/chinese-bucket-fso/.Trash/systest/Current/客客客.pdf
{code}
 

Resolving the ticket, please reopen the ticket as needed if a similar issue is 
encountered!

> [Ozone-s3api] Chinese characters written via s3api is breaking sh and fs 
> operations
> -----------------------------------------------------------------------------------
>
>                 Key: HDDS-10571
>                 URL: https://issues.apache.org/jira/browse/HDDS-10571
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: S3, s3gateway
>    Affects Versions: 1.5.0
>            Reporter: Soumitra Sulav
>            Assignee: Tanvi Penumudy
>            Priority: Critical
>
> Read/write chinese characters works well with S3 Gateway but if we do create 
> such keys with Chinese characters in the name, further interoperable cli like 
> sh and fs breaks.
> Tested with both OBS and FSO bucket layout. Delete / List fail with error on 
> such bucket or path.
> {code:java}
> Malformed input or input contains unmappable characters:{code}
> Error in detail :
> {code:java}
> [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# ozone sh 
> bucket info /s3v/chinese-bucket
> {
>   "metadata" : { },
>   "volumeName" : "s3v",
>   "name" : "chinese-bucket",
>   "storageType" : "DISK",
>   "versioning" : false,
>   "usedBytes" : 18000000000,
>   "usedNamespace" : 1,
>   "creationTime" : "2024-03-21T10:19:54.719Z",
>   "modificationTime" : "2024-03-21T10:19:54.719Z",
>   "sourcePathExist" : true,
>   "quotaInBytes" : -1,
>   "quotaInNamespace" : -1,
>   "bucketLayout" : "OBJECT_STORE",
>   "owner" : "om",
>   "link" : false
> }
> [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# ozone sh key 
> list /s3v/chinese-bucket
> Malformed input or input contains unmappable characters: ??
> [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# aws s3api 
> --endpoint https://ccycloud-4.quasar-jqpbgq.root.comops.site:9879/ 
> list-objects --bucket chinese-bucket --no-verify-ssl
> /usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py:1061: 
> InsecureRequestWarning: Unverified HTTPS request is being made to host 
> 'ccycloud-4.quasar-jqpbgq.root.comops.site'. Adding certificate verification 
> is strongly advised. See: 
> https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
>   warnings.warn(
> {
>     "Contents": [
>         {
>             "Key": "客客",
>             "LastModified": "2024-03-21T10:31:10.512Z",
>             "ETag": "2024-03-21T10:31:10.512Z",
>             "Size": 6000000000,
>             "StorageClass": "STANDARD"
>         }
>     ],
>     "RequestCharged": null
> }
> [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# ozone sh key 
> delete /s3v/chinese-bucket/客客
> KEY_NOT_FOUND Key not found[r...@ccycloud-1.quasar-jqpbgq.root.comops.site 
> multi_upload]# ozone sh bucket info /linkvol1710602076/gkpznymu
> {
>   "metadata" : { },
>   "volumeName" : "linkvol1710602076",
>   "name" : "gkpznymu",
>   "storageType" : "DISK",
>   "versioning" : false,
>   "usedBytes" : 36094371864,
>   "usedNamespace" : 7,
>   "creationTime" : "2024-03-21T08:01:57.656Z",
>   "modificationTime" : "2024-03-21T09:03:41.356Z",
>   "sourcePathExist" : true,
>   "quotaInBytes" : -1,
>   "quotaInNamespace" : -1,
>   "bucketLayout" : "FILE_SYSTEM_OPTIMIZED",
>   "owner" : "om",
>   "link" : false
> }
> [r...@ccycloud-1.quasar-jqpbgq.root.comops.site multi_upload]# ozone fs -ls 
> -R ofs://ozone1710999924/linkvol1710602076/gkpznymu
> -ls: Malformed input or input contains unmappable characters: ?
> Usage: ozone fs [generic options] {code}
>  
> Steps to reproduce :
> {code:java}
> aws s3 cp 客客 s3://chi --endpoint 
> https://ccycloud-4.quasar-jqpbgq.root.comops.site:9879/ --no-verify-ssl
> # To validate
> aws s3api list-objects --bucket chi --endpoint 
> https://ccycloud-4.quasar-jqpbgq.root.comops.site:9879/ --no-verify-ssl
> ozone sh key list /s3v/chi
> ## Malformed input or input contains unmappable characters: ??
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to