Currently, there are various interfaces for file IO operations in Doris: - There are FileReader and FileWriter in the query layer. There are corresponding implementations for HDFS, S3, Broker, and Local. - In the storage layer, there is a BlockManager that abstracts Block, there are WriteableFileBlock, ReadableFileBlock. - For directory management work, there is an Env interface that can include directory operations, including RemoteEnv and PosixEnv, and there are also some link files and delete blocks in BlockManager; in addition, for S3, HDFS, there are operations such as S3StorageBackend that contain some file directories, including mkdir, copy , rm these operations
So many ways to operate will cause the following problems: - It's messy, sometimes I don't know which one to use, many functions are repeated, but they have different abstract names; - Modifying a feature or fix a bug needs to be modified in multiple places. For example, if we want to read S3 and have a local cache, then all places need to be added; We need to unify the IO stack. In fact, access to IO can be roughly divided into the following three types: - Directory operations, create files, delete files, get file list, etc. - File write operation - File read operation And we could implement these API for different storage backends: - Local file - S3 file - HDFS file - Broker Once implemented, it can be used in the storage layer (separation of hot and cold, separation of storage and computing), query layer (query S3, query HDFS), backup and recovery, etc., to avoid repeated development and maintenance -- Guolei Yi Tel:134-3991-0228 Email:yiguo...@gmail.com