[GitHub] [ozone] elek commented on a change in pull request #1419: HDDS-3755. [DESIGN] Storage-class for Ozone

GitBox Tue, 10 Nov 2020 02:41:19 -0800


elek commented on a change in pull request #1419:
URL: https://github.com/apache/ozone/pull/1419#discussion_r520461837




##########
File path: hadoop-hdds/docs/content/design/storage-class.md
##########
@@ -19,10 +19,331 @@ author: Marton Ele
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
 -->
+
+
 # Abstract
 
-Proposal suggest to introduce a new storage-class abstraction which can be 
used to define different replication strategies (factor, type, ...) for 
different bucket/keys.
+One of the fundamental abstraction of Ozone is the _Container_ which used as 
the unit of the replication.
+
+Containers have to favors: _Open_ and _Closed_ containers: Open containers are 
replicated by Ratis and writable, Closed containers are replicated with data 
copy and read only.
+
+In this document a new level of abstraction is proposed: the *storage class* 
which defines which type of containers should be used and what type of 
transitions are supported.
+
+# Goals / Use cases
+
+## [USER] Simplify user interface and improve usability
+
+Users can choose from an admin provided set of storage classes (for example 
`STANDARD`, `REDUCED`) instead of using implementation specific terms 
(`RATIS/THREE`, `RATIS/ONE`)
+
+Today the users should use implementation spefific terms when key is created:
+
+```
+ozone sh key put --replication=THREE --type=RATIS /vol1/bucket1/key1 
source-file.txt
+```
+
+There are two problems here:
+
+ 1. User should use low-level, technical terms during the usage. User might 
not know what is `RATIS` and may not have enough information to decide the 
right replication scheme.
+
+ 2. The current keys are only for the *open* containers. There is no easy way 
to add configuration which can be used later during the lifecycle of 
containers/keys. (For example to support `Ratis/THREE` --> `Ratis/TWO`)
+
+With the storage-class abstraction the complexity of configuration can be 
moved to the admin side (with more flexibility). And user should choose only 
from the available storage-classes (or use the default one).
+
+Instead of the earlier CLI this document proposes to use an abstract 
storage-class parameter instead:
+
+```
+ozone sh key put --storage-class=STANDARD /vol1/bucket1/key1 source-file.txt
+```
+
+## [USER] Set a custom replication for a newly created bucket
+
+A user may want to set a custom replication for bucket at the time of 
creation. All keys in the bucket will respect the specified storage class 
(subject to storage and quota availability). E.g.
+
+```
+ozone sh bucket create --storage-class=INFREQUENT_ACCESS
+```
+
+
+Bucket-level default storage-class can be overridden for any key, but will be 
used as default.
+
+
+## [USER] Fine grained replication control when using S3 API
+
+A user may want to set custom replication policies for any key **which 
uploaded via S3 API**. Storage-classes are already used by AWS S3 API. With 
first-class support of the same concept in Ozone users can choose from the 
predefined storage-classes (=replication rules) with using AWS API:
+
+
+```
+aws s3 cp --storage-class=REDUCED file1 s3://bucket/file1
+```
+
+
+## [USER] Set the replication for a specific prefix
+
+A user may want to set a custom replication for a specific key prefix. All 
keys matching that prefix will respect the specified storage class. This 
operation will not affect keys already in the prefix (question: consider 
supporting this with data movement?)
+
+```
+ozone sh prefix setClass --storage-class=REDUCED /vol1/bucket1/tmp
+```
+
+Prefix-level default storage-class can be overridden for ay key, but will be 
used as default.
+
+## [ADMIN/DEV] Support multiple replication schemes
+
+Today there are two replication schemes which are hard coded in the code. 
Storage-class abstraction extends this behavior to support any number of 
replication schemes.
+
+Keys (and containers) can be categorized by storage-class which determines the 
replication scheme.
+
+## [ADMIN/USER] Flexible administrations
+
+As it's mentioned above, today it's hard to configure the details of the 
replications for key/bucket level. The only thing what we can define is the 
replication type for open containers (RATIS/THREE or RATIS/ONE) which 
determines the later lifecycle of the keys/containers.
+
+Any specific replication configuration can be configured only on cluster level 
and not on key level.
+
+A storage-class can define all the parameters for the spefific containers/keys:
+
+As an example this could be a storage-class definitions:
+
+```
+name: STANDARD
+states:
+    - name: open
+      replicationType: RATIS
+      repliationFactor: THREE
+    - name: closed
+      replicationType: COPY
+      repliationFactor: TWO
+      rackPolicy: different
+      transitions:
+        - target: ec
+          trigger:
+             ratio: 90%
+             used: 30d
+    - name: ec
+      codec: Reed-Solomon
+      scheme:
+         data: 6
+         parity: 3
+```
+
+This defines a replication scheme where only two replicas are enough from 
closed containers, and container will be erasure encoded under the hood if the 
90% of the content is not used in the last 30 days.
+
+Please note that:
+
+ * All the low-level details of the replication rules can be configured here 
by the administrators
+ * Configuration is not global and not cluster-level, one can have different 
configuration for different storage-classes (which means for different 
keys/containers)
+ * Users dont' need to face with these details as they can use the 
storage-class (or just use the pre-created buckets and use default 
storage-class) abstraction
+
+## [DEV] Give flexibility to the developers
+
+Storage-class abstraction provides an easy way to plug in newer replication 
schemes. New type of replications (like EC) can be supported easily as the 
system will be prepared to allocate different type of containers.
+
+## [ADMIN] Better upgrade support
+
+Let's imagine that a new type of Open container replication is introduced 
(`RATIS-STREAM/THREE` instead of `RATIS/THREE`). If storage-classes are stored 
with the keys and containers instead of the direct replication rules we can:
+
+ 1. Easily change the replicaiton method of existing buckets/keys
+ 2. Turn on experimental features for specific buckets
+
+
+## [ADMIN] Change the cluster-wide replication
+
+An admin may decide to set a custom policy for an entire cluster.
+
+```
+ozone sh prefix setClass --storage-class=EC_6_3 /
+```
+
+# Unsupported use cases
+
+The following use cases are specifically unsupported.
+
+## [USER] Change the replication policy for a pre-existing key
+
+Changing the replication policy for a pre-existing key will require data 
movement and reauthoring containers and hence it is unsupported.
+
+## [USER] Defining storage-classes using Hadoop Compatible File System 
interface
+
+It's not possible to defined storage-class (or any replication rule) with 
using *Hadoop Compatible File System* interface. However storage-class defined 
on bucket level (or prefix level) will be inherited, even if the keys are 
created view the `o3fs://` or `o3s://` interfaces
+
+# The storage-class as an abstraction
+
+The previos section explained some user facing property of the storage-class 
concept. This section explains the concept compared to the existing Ozone 
design.
+
+## Containers in more details
+
+Container is the unit of replication of Ozone. One Container can store 
multiple blocks (default container size is 5GB) and they are replicated 
together. Datanodes report only the replication state of the Containers back to 
the Storage Container Manager (SCM) which makes it possible to scale up to 
billions of objects.
+
+The identifier of a block (BlockId) containers ContainerId and LocalId (ID 
inside the container). ContainerId can be used to find the right Datanode which 
stores the data. LocalId can be used to find the data inside one container.
+
+Container type defines the following:
+
+ * How to write to the containers?
+ * How to read from the containers?
+ * How to recover / replicate data in case of error
+ * How to store the data on the Datanode (related to the *how to write* 
question?)
+
+THe current definition of *Ratis/THREE* is the following (simplified version):
+
+ * **How to write**: Call standard Datanode RPC API on *Leader*. Leader will 
replicate the data to the followers

Review comment:
       >  It also defines how recovery from failures must be done
   
   Agree, this is included in the line:
   
   _How to recover / replicate data in case of error_
   
   > how containers must be closed
   
   Good point, and it's a very important. I think there are two possible way to 
think about it:
   
    * We have OPEN containers and CLOSED containers, and *replication 
mechanism* is the definition the transitions between them
   
    * We have one NORMAL type of container which two type of states (Open, 
Closed).
   
   Technically both are the same (IMHO) just with different way to explain it. 
This document followed the first approach but I added more text to clarify it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] elek commented on a change in pull request #1419: HDDS-3755. [DESIGN] Storage-class for Ozone

Reply via email to