[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615639#comment-14615639
]
Colin Patrick McCabe commented on HDFS-7240:
--------------------------------------------
bq. 3) We will partition objects using hash partitioning and range
partitioning. The document already talks about hash partitioning, I will add
more details for range partition support. The small objects will also be stored
in the container. In the document I mentioned leveldbjni, but we are also
looking at rocksDB for container implementation to store objects in the
container.
Interesting, thanks for posting more details about this.
Maybe this has been discussed on another JIRA (I apologize if so), but does
this mean that the admin will have to choose between hash and range
partitioning for a particular bucket? This seems suboptimal to me... we will
have to support both approaches, which is more complex, and admins will be left
with a difficult choice.
It seems better just to make everything range-partitioned. Although this is
more complex than simple hash partitioning, it provides "performance
compatibility" with s3 and other object stores. s3 provides a fast
(sub-linear) way of getting all the keys in between some A and B. It will be
very difficult to really position ozone as s3-compatible if operations that are
quick in s3 such as listing all the keys between A and B are O(num_keys^2) in
ozone.
For example, in (all of) Hadoop's s3 filesystem implementations, listStatus
uses this quick listing of keys between A and B. When someone does "listStatus
/a/b/c", we can ask s3 for all the keys between /a/b/c/ and /a/b/c0 (0 is the
ASCII value right after slash). Of course, s3 does not really have
directories, but we can treat the keys in this range as being in the directory
/a/b/c for the purposes of s3a or s3n. If we just had hash partitioning, this
kind of operation would be O(N^2) where N is the number of keys. It would just
be infeasible for any large bucket.
> Object store in HDFS
> --------------------
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Jitendra Nath Pandey
> Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS.
> As part of the federation work (HDFS-1052) we separated block storage as a
> generic storage layer. Using the Block Pool abstraction, new kinds of
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)