[ 
https://issues.apache.org/jira/browse/HDFS-12506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178495#comment-16178495
 ] 

Weiwei Yang commented on HDFS-12506:
------------------------------------

Hi [~linyiqun]

I just uploaded v7 patch that hopefully fixed the java doc warnings. And 
regarding to your comment

bq. getSequentialRangeKVs can also make sense in listKeys

Actually there are more places should be replaced with 
{{getSequentialRangeKVs}}, I did not include them in this patch because I 
haven't tested them all. I will open another JIRA to track this issue, and make 
sure they get fixed with sufficient testing. Lets keep this JIRA focus on 
fixing {{listBucket}} issue. Does that sound good to you?

[~anu], thanks for reviewing this patch, since your comments are not from the 
changes introduced by this patch, I have opened another lower priority cleanup 
JIRA HDFS-12539 to get these stuff fixed. Does that sound good to you?

Thanks

> Ozone: ListBucket is too slow
> -----------------------------
>
>                 Key: HDFS-12506
>                 URL: https://issues.apache.org/jira/browse/HDFS-12506
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Blocker
>              Labels: ozoneMerge, performance
>         Attachments: HDFS-12506-HDFS-7240.001.patch, 
> HDFS-12506-HDFS-7240.002.patch, HDFS-12506-HDFS-7240.003.patch, 
> HDFS-12506-HDFS-7240.004.patch, HDFS-12506-HDFS-7240.005.patch, 
> HDFS-12506-HDFS-7240.006.patch, HDFS-12506-HDFS-7240.007.patch
>
>
> Generated 3 million keys in ozone, and run {{listBucket}} command to get a 
> list of buckets under a volume,
> {code}
> bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-15143 -user wwei
> {code}
> this call spent over *15 seconds* to finish. The problem was caused by the 
> inflexible structure of KSM DB. Right now {{ksm.db}} stores keys like 
> following
> {code}
> /v1/b1
> /v1/b1/k1
> /v1/b1/k2
> /v1/b1/k3
> /v1/b2
> /v1/b2/k1
> /v1/b2/k2
> /v1/b2/k3
> /v1/b3
> /v1/b4
> {code}
> keys are sorted in nature order so when we do list buckets under a volume e.g 
> /v1, we need to seek to /v1 point and start to iterate and filter keys, this 
> ends up with scanning all keys under volume /v1. The problem with this design 
> is we don't have an efficient approach to locate all buckets without scanning 
> the keys.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to