[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504691#comment-14504691
]
Jitendra Nath Pandey commented on HDFS-7240:
--------------------------------------------
[~clamb], thanks for a detailed review and feedback.
Some of the answers are below, for others I will post the updated document with
details as you have pointed out.
bq. Is the 1KB key size limit a hard limit or just a design/implementation
target
It is a design target. Amazon's S3 limits the keys to 1KB. I doubt there would
be many use cases that need beyond it. I see the point that instead of hard
limit allow for degradation. But at this point in the project, I would prefer
to have more strict limits, and relax later instead of setting user
expectations too high to begin with.
bq. Caching to reduce network traffic
I agree that a good caching layer will significantly help the performance.
Ozone handler seems like a natural place for caching. However, a thick client
can do its own caching without overloading datanodes. The focus of phase 1 is
to get the semantics right and lay down the basic architecture in place. We
plan to attack performance improvements in a later phase of the project.
bq. Security mechanisms
Frankly, I haven't thought about anything other than kerberos. I agree, we
should evaluate it against what other popular object stores use.
bq. Hot spots in hash partitioning.
It is possible for a pathological sequence of keys, but in practice hash
partitioning has been successfully used to avoid hot spots e.g.
hash-partitioned indexes in databases. We would need to pick hash functions
with nice distribution properties.
bq. Secondary indexing consistency
The secondary index need not be strictly consistent with the bucket. That means
a listing operation with prefix or key range may not reflect the latest of the
bucket. We will have a more concrete proposal in the second phase of the
project.
bq. Storage volume GET for admin
I believed that it is not a security concern in allowing users to see all
storage volume names. However, it is possible to conceive a use case where an
admin would want to restrict that. Probably we can support both the modes.
bq. "no guarantees on partially written objects"
The object will not be visible until completely written. Also, no recovery
is planned for the first phase if the write fails. In future, we would like to
support multi-part uploads.
bq. Re-using block management implementation for container management.
We intend to reuse the DatanodeProtocol that datanode uses to talk to
namenode. I will add more details to the document and on the corresponding jira.
bq. storage container prototype using leveldbjni
We will add lot more details on this in its own jira. The idea is to use
leveldbjni in the storage container in the datanodes. We plan to prototype a
storage container that stores objects as individual files within the container
however, that would need an index within the container to map a key to a file.
We will use leveldbjni for that index.
Another possible prototype is to put the entire object in the leveldbjni
itself. It will take some experimentation to zero-down to the right approach.
We will also try to make the storage container implementation pluggable to make
it easy to try different implementations.
bq. How are quotas enabled and set? ....who enforces them
All the Ozone APIs are implemented in ozone handler. The quota will also be
enforced by the ozone handler. I will update the document with the APIs.
> Object store in HDFS
> --------------------
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Jitendra Nath Pandey
> Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS.
> As part of the federation work (HDFS-1052) we separated block storage as a
> generic storage layer. Using the Block Pool abstraction, new kinds of
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)