[
https://issues.apache.org/jira/browse/KUDU-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dengke updated KUDU-3413:
-------------------------
Attachment: data_and_metadata.png
> Kudu multi-tenancy
> ------------------
>
> Key: KUDU-3413
> URL: https://issues.apache.org/jira/browse/KUDU-3413
> Project: Kudu
> Issue Type: New Feature
> Reporter: dengke
> Assignee: dengke
> Priority: Major
> Attachments: data_and_metadata.png, kudu table topology.png
>
>
> h1. 1、Definition
> Tenant: A cluster user can be called a tenant. Tenants may be divided by
> project or actual application. Each tenant is equivalent to a resource pool,
> and all users under a tenant share all resources of the resource pool.
> Multiple tenants share a cluster resource.
> User: The user of cluster resources.
> Multi tenant: The database level controls that tenants cannot access each
> other, and resources are private and independent(Note: Kudu does not have the
> concept of database, which is simply understood as multiple tables).
> h1. 2.Current situation
> The latest version of kudu has realized ‘data at rest encryption', mainly
> cluster level authentication and encryption, data storage encryption of a
> single server level, which can meet the needs of basic encryption scenarios,
> but there is still a little gap from the tenant level encryption we are
> pursuing.
> h1. 3.Outline design
> In general, there are the following differences between tenant level
> encryption and cluster level encryption:
> *Tenant level encryption requires data storage isolation, which means data
> between tenants needs to be separated (a new layer of namespace namespace may
> be added to the storage topology, and data of the same tenant is stored in
> the same namespace path, with minimal mutual impact);
> *The generation and use of tenants'keys. In a multi tenant scenario, we need
> to replace the cluster key with the tenant key
> h1. 4.Design
> h2. 4.1 Namespace
> The namespace in the storage field of the industry is mainly used to
> maintain the file attributes, directory tree structure and other metadata
> information of the file system, and is compatible with POSIX directory trees
> and file operations. It is a core concept in file storage.
> Taking the common HDFS as an example, its namespace is mainly implemented
> based on "the disk allows logical partitioning, while attaching partition
> files to different directories, and finally modifying the directory owner's
> permissions" to achieve resource isolation.
> Corresponding to the Kudu system, the current storage topology is
> relatively mature, and the kudu client's read/write requests need to be
> processed by tserver before the corresponding data can be obtained. The
> request does not involve direct manipulation of raw data, that is, the client
> does not perceive the data distribution in the storage engine at all, there
> is a natural degree of data isolation. However, the data in the storage
> engine are intertwined. In some extreme cases, there is still the possibility
> of interaction. The best solution is to completely distinguish the
> read/write, compact and other processing processes of different tenants.
> However, it requires a lot of changes and may lead to system instability. We
> can make minimal changes by tenant to achieve physical isolation of data
>
> First, we need to analyze the current storage topology: a table in kudu
> will be divided into multiple tablet partitions. Each tablet includes
> metadata meta information and several RowSets. The RowSet contains a
> 'MemRowSet'(corresponding to the data in memory) and multiple
> 'DiskRowSets'(corresponding to the data on the disk). The 'DiskRowSet'
> contains 'BloomFile’、'Ad_hoc Index’、'BaseData'、'DeltaMem' and several
> 'RedoFiles' and 'UndoFile' (generally, there is only one 'UndoFile'). For
> more specific distribution information, please refer to the following figure.
> !kudu table topology.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)