[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504550#comment-14504550
]
Jitendra Nath Pandey commented on HDFS-7240:
--------------------------------------------
Thanks for the feedback and comments. I will try to answer the questions over
my next few comments. I will also update the document to reflect the discussion
here.
The stated limits in the document are more of the design goals, and
parameters we have in mind while designing for the first phase of the project.
These are not hard limits and most of these will be configurable. First I will
state a few technical limits and then describe some back of the envelope
calculations and heuristics I have used behind these numbers.
The technical limitations are following.
# The memory in the storage container manager limits the number of storage
containers. From the namenode experience, I believe we can go up to a few 100
million storage containers. In later phases of the project we can have a
federated architecture with multiple storage container managers for further
scale up.
# The size of a storage container is limited by how quick we want to
replicate the containers when a datanode goes down. The advantage of using a
large container size is that it reduces the metadata needed to track container
locations which is proportional to number of containers. However, a very large
container will reduce the parallelization that cluster can achieve to replicate
when a node fails. The container size will be configurable. A default size of
10G seems like a good choice, which is much larger than hdfs block sizes, but
still allows hundreds of containers on datanodes with a few terabytes of disk.
The maximum size of an object is stated as 5G. In future we would like to
even increase this limit when we can support multi-part writes similar to S3.
However, it is expected that average size of the objects would be much smaller.
The most common range is expected to be a few hundred KBs to a few hundred MBs.
Assuming 100 million containers, 1MB average size of an object, and 10G the
storage container size, it amounts to 10 Trillion objects. I think 10 trillion
is a lofty goal to have : ). The division of 10 trillion into 10 million
buckets with a million object in each bucket is kind of arbitrary, but we
believed users will prefer smaller buckets for better organization. We will
keep these configurable.
The storage volume settings provide admins a control over the usage of the
storage. In a private cloud, a cluster shared by lots of tenants can have a
storage volume dedicated to each tenant. A tenant can be a user or a project or
a group of users. Therefore, a limit of 1000 buckets implying around 1PB of
storage per tenant seems reasonable. But, I do agree that when we have a quota
on a storage volume size, an additional limit on number of buckets is not
really needed.
We plan to carry out the project in several phases. I would like to propose
following phases:
Phase 1
# Basic API as covered in the document.
# Storage container machinery, reliability, replication.
Phase 2
# High availability
# Security
# Secondary index for object listing with prefixes.
Phase 3
# Caching to improve latency.
# Further scalability in terms of number of objects and object sizes.
# Cross-geo replication.
I have created branch HDFS-7240 for this work. We will start filing jiras and
posting patches.
> Object store in HDFS
> --------------------
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Jitendra Nath Pandey
> Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS.
> As part of the federation work (HDFS-1052) we separated block storage as a
> generic storage layer. Using the Block Pool abstraction, new kinds of
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)