Guocheng Zhang created TUBEMQ-110:
-------------------------------------

             Summary: Transform Broker storage to improve throughput
                 Key: TUBEMQ-110
                 URL: https://issues.apache.org/jira/browse/TUBEMQ-110
             Project: Apache TubeMQ
          Issue Type: Task
            Reporter: Guocheng Zhang


I think the current Broker's read and write performance still has a relatively 
large room for improvement. We need to continue to iterate to improve the 
storage performance of the system. I have listed some considerations and hope 
to get some better suggestions:

1. Data read and write operations should consider the characteristics of the 
disk, for example, the disk is based on 512-byte sectors as its storage unit, 
and read data in batches of 64k; the file system will eliminate the cache 
according to certain rules Pages in memory etc. If the read and write 
operations take these contents into account, I believe that the current TPS can 
be higher;

2. Storage should consider the problem of fragmentation of disk space, such as 
pre-allocation of fixed-length files and reuse of aging files to enable 
continuous reading of disk files and improve data read and write speed;

3. The number of memory cache blocks should be configurable: the current memory 
cache is managed according to the fixed configuration of 2 memory blocks per 
topic. We should allow the business to build more memory cache space based on 
actual resource conditions;

4. More effective memory-to-disk operation: At present, the flashing operation 
is to flash messages from the memory to the disk one by one for storage. This 
block can be adjusted to write to the disk in batches according to the memory 
block, thereby improving storage efficiency;

5. Remove the SSD auxiliary consumption function: Because the SSD disk capacity 
is too small, the SSD storage consumption is not suitable for practical 
applications, so it should be removed to avoid user confusion, and related 
configurations and settings need to be cleaned up;

6. The stored file should increase the content of the file header, including 
the data version information, in order to facilitate the subsequent storage 
scheme is still seamlessly compatible with the data format of the old version;

7. Add CheckPoint check mechanism: the current system will only check the 
validity of the last file when it is restarted. In fact, when the system is 
shut down, there may be multiple consecutive files still in memory, the 
practice that only the last file is checked currently is easy to cause abnormal 
mixing into the data stream, we should add CheckPoint mechanism to improve this 
abnormal situation.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to