tristaZero commented on a change in pull request #101: Sharding-JDBC manual modification URL: https://github.com/apache/incubator-shardingsphere-doc/pull/101#discussion_r267273391
########## File path: document/current/content/features/orchestration/encrypt.en.md ########## @@ -1,15 +1,42 @@ +++ pre = "<b>3.3.5. </b>" toc = true -title = "Data Masking" +title = "Data Desensitization" weight = 5 + +++ ## Background -TODO -## Solutions +Security control has always been a crucial link of data orchestration; data desensitization falls into this category. For both Internet enterprises and traditional sectors, data security has always been a highly focused and sensitive topic. Data desensitization refers to transforming some sensitive information through desensitization rules to safely protect the private data. Data that involves client security or business sensibility, such as ID number, phone number, card number, client number and other personal information, is required of data desensitization according to relevant regulations. + +Because of that, ShardingSphere has provided the function of data desensitization, which stores users' sensitive information in the database after encryption. When users search for them, they will be decrypted and returned to users as the original data. It has the encryption and decryption processes totally transparent to users, who can store desensitized data and acquire original data without any awareness. In addition, ShardingSphere has provided internal desensitization algorithm, which can directly used by users. In the same time, we have also provided desensitization algorithm related interfaces, which can be implemented by users themselves. Then, after simple configurations, ShardingSphere can use algorithms provided by users to perform encryption, decryption and desensitization operations. + +## Solution + +ShardingSphere has provided two data desensitization solutions, corresponding to two ShardingSphere encryption and decryption interfaces, i.e., `ShardingEncryptor` and `ShardingQueryAssistedEncryptor`. + +On the one hand, ShardingSphere has provided internal encryption and decryption implementations for users, which can be used by them only after configuration. On the other hand, to satisfy users' requirements for different scenarios, we have also opened relevant encryption and decryption interfaces, according to which users can provide specific implementation types. Then, after simple configurations, ShardingSphere can use encryption and decryption solutions defined by users themselves to desensitize data. ### ShardingEncryptor +The solution has provided two methods, `encrypt()` and `decrypt()`, to encrypt and decrypt data to be desensitized. + +When users perform `INSERT`, `DELETE` and `UPDATE` operations, ShardingSphere will parse, rewrite and route SQL. It will also use `encrypt()` to encrypt data and store them in the database. When using `SELECT`, they will reversely decrypt sensitive data from the database with `decrypt()` and return them to users at last. +Currently, ShardingSphere has provided two implementation types for this kind of desensitization solution, MD5 (irreversible) and AES (reversible), which can be used only after users' configuration. + ### ShardingQueryAssistedEncryptor + +Compared with the first desensitization scheme, this one is more secure and complex. Its concept is: even the same data, two same user passwords for example, should not be stored as the same desensitized form in the database. It can help to protect user information and avoid credential stuffing. + +This scheme provides three functions to implement, `encrypt()`, `decrypt()` and `queryAssistedEncrypt()`. +In `encrypt()` phase, users can set some variable, timestamp for example, and encrypt a combination of original data + variable. This method can make sure the encrypted desensitization data of the same original data are different, due to the existence of variables. In `decrypt()` phase, users can use variable data to decrypt according to the encryption algorithms set formerly. + +Though this method can indeed increase data security, another problem can appear with it: as the same data is stored in the database in different content, users may not be able to find out all the same original data with equivalent query (`SELECT FROM table WHERE encryptedColumnn = ?`) according to this encryption column. + +Because of it, we have brought out the concept of assistant query column, which is generated by `queryAssistedEncrypt()`. Different from `decrypt()`, this method uses another way to encrypt the original data; but for the same original data, it can generate consistent encryption data. Users can store data processed by `queryAssistedEncrypt()` to assist the query of original data. So there may be one more assistant query column in the table. + +`queryAssistedEncrypt()` and `encrypt()` can generate and store different encryption data; `decrypt()` is reversible and `queryAssistedEncrypt()` is irreversible. So when querying the original data, we will parse, rewrite and route SQL automatically. We will also use assistant query column to do `WHERE` condition query and use `decrypt()` to decrypt `encrypt()` data and return them to users. All these can not be felt by users. Review comment: `decrypt() is reversible and queryAssistedEncrypt() is irreversible.`--->`The data generated by encrypt() can be decrypted by decrypt(), but there is no function to decrypt the encrypted data from queryAssistedEncrypt() ` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
