[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504550#comment-14504550
 ] 

Jitendra Nath Pandey commented on HDFS-7240:
--------------------------------------------

Thanks for the feedback and comments. I will try to answer the questions over 
my next few comments. I will also update the document to reflect the discussion 
here.

  The stated limits in the document are more of the design goals, and 
parameters we have in mind while designing for the first phase of the project. 
These are not hard limits and most of these will be configurable. First I will 
state a few technical limits and then describe some back of the envelope 
calculations and heuristics I have used behind these numbers.
  The technical limitations are following.
  # The memory in the storage container manager limits the number of storage 
containers. From the namenode experience, I believe we can go up to a few 100 
million storage containers. In later phases of the project we can have a 
federated architecture with multiple storage container managers for further 
scale up.
  # The size of a storage container is limited by how quick we want to 
replicate the containers when a datanode goes down. The advantage of using a 
large container size is that it reduces the metadata needed to track container 
locations which is proportional to number of containers. However, a very large 
container will reduce the parallelization that cluster can achieve to replicate 
when a node fails. The container size will be configurable. A default size of 
10G seems like a good choice, which is much larger than hdfs block sizes, but 
still allows hundreds of containers on datanodes with a few terabytes of disk.

  The maximum size of an object is stated as 5G. In future we would like to 
even increase this limit when we can support multi-part writes similar to S3. 
However, it is expected that average size of the objects would be much smaller. 
The most common range is expected to be a few hundred KBs to a few hundred MBs.
  Assuming 100 million containers, 1MB average size of an object, and 10G the 
storage container size, it amounts to 10 Trillion objects. I think 10 trillion 
is a lofty goal to have : ). The division of 10 trillion into 10 million 
buckets with a million object in each bucket is kind of arbitrary, but we 
believed users will prefer smaller buckets for better organization. We will 
keep these configurable. 

  The storage volume settings provide admins a control over the usage of the 
storage. In a private cloud, a cluster shared by lots of tenants can have a 
storage volume dedicated to each tenant. A tenant can be a user or a project or 
a group of users. Therefore, a limit of 1000 buckets implying around 1PB of 
storage per tenant seems reasonable. But, I do agree that when we have a quota 
on a storage volume size, an additional limit on number of buckets is not 
really needed.

  We plan to carry out the project in several phases. I would like to propose 
following phases:

  Phase 1
   # Basic API as covered in the document.
   # Storage container machinery, reliability, replication.

  Phase 2
   # High availability
   # Security
   # Secondary index for object listing with prefixes.

  Phase 3
   # Caching to improve latency.
   # Further scalability in terms of number of objects and object sizes.
   # Cross-geo replication.

I have created branch HDFS-7240 for this work. We will start filing jiras and 
posting patches. 

> Object store in HDFS
> --------------------
>
>                 Key: HDFS-7240
>                 URL: https://issues.apache.org/jira/browse/HDFS-7240
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to