Currently, there are various interfaces for file IO operations in Doris:

   - There are FileReader and FileWriter in the query layer. There are
   corresponding implementations for HDFS, S3, Broker, and Local.
   - In the storage layer, there is a BlockManager that abstracts Block,
   there are WriteableFileBlock, ReadableFileBlock.
   - For directory management work, there is an Env interface that can
   include directory operations, including RemoteEnv and PosixEnv, and there
   are also some link files and delete blocks in BlockManager; in addition,
   for S3, HDFS, there are operations such as S3StorageBackend that contain
   some file directories, including mkdir, copy , rm these operations

So many ways to operate will  cause the following problems:

   - It's messy, sometimes I don't know which one to use, many functions
   are repeated, but they have different abstract names;
   - Modifying a feature or fix a bug needs to be modified in multiple
   places. For example, if we want to read S3 and have a local cache, then all
   places need to be added;

We need to unify the IO stack. In fact, access to IO can be roughly divided
into the following three types:

   - Directory operations, create files, delete files, get file list, etc.
   - File write operation
   - File read operation

And we could implement these API for different storage backends:


   - Local file
   - S3 file
   - HDFS file
   - Broker

Once implemented, it can be used in the storage layer (separation of hot
and cold, separation of storage and computing), query layer (query S3,
query HDFS), backup and recovery, etc., to avoid repeated development and
maintenance

-- 
Guolei Yi
Tel:134-3991-0228
Email:yiguo...@gmail.com

Reply via email to