[
https://issues.apache.org/jira/browse/HDFS-12506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178495#comment-16178495
]
Weiwei Yang commented on HDFS-12506:
------------------------------------
Hi [~linyiqun]
I just uploaded v7 patch that hopefully fixed the java doc warnings. And
regarding to your comment
bq. getSequentialRangeKVs can also make sense in listKeys
Actually there are more places should be replaced with
{{getSequentialRangeKVs}}, I did not include them in this patch because I
haven't tested them all. I will open another JIRA to track this issue, and make
sure they get fixed with sufficient testing. Lets keep this JIRA focus on
fixing {{listBucket}} issue. Does that sound good to you?
[~anu], thanks for reviewing this patch, since your comments are not from the
changes introduced by this patch, I have opened another lower priority cleanup
JIRA HDFS-12539 to get these stuff fixed. Does that sound good to you?
Thanks
> Ozone: ListBucket is too slow
> -----------------------------
>
> Key: HDFS-12506
> URL: https://issues.apache.org/jira/browse/HDFS-12506
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone
> Reporter: Weiwei Yang
> Assignee: Weiwei Yang
> Priority: Blocker
> Labels: ozoneMerge, performance
> Attachments: HDFS-12506-HDFS-7240.001.patch,
> HDFS-12506-HDFS-7240.002.patch, HDFS-12506-HDFS-7240.003.patch,
> HDFS-12506-HDFS-7240.004.patch, HDFS-12506-HDFS-7240.005.patch,
> HDFS-12506-HDFS-7240.006.patch, HDFS-12506-HDFS-7240.007.patch
>
>
> Generated 3 million keys in ozone, and run {{listBucket}} command to get a
> list of buckets under a volume,
> {code}
> bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-15143 -user wwei
> {code}
> this call spent over *15 seconds* to finish. The problem was caused by the
> inflexible structure of KSM DB. Right now {{ksm.db}} stores keys like
> following
> {code}
> /v1/b1
> /v1/b1/k1
> /v1/b1/k2
> /v1/b1/k3
> /v1/b2
> /v1/b2/k1
> /v1/b2/k2
> /v1/b2/k3
> /v1/b3
> /v1/b4
> {code}
> keys are sorted in nature order so when we do list buckets under a volume e.g
> /v1, we need to seek to /v1 point and start to iterate and filter keys, this
> ends up with scanning all keys under volume /v1. The problem with this design
> is we don't have an efficient approach to locate all buckets without scanning
> the keys.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]