Guocheng Zhang created TUBEMQ-110:
-------------------------------------
Summary: Transform Broker storage to improve throughput
Key: TUBEMQ-110
URL: https://issues.apache.org/jira/browse/TUBEMQ-110
Project: Apache TubeMQ
Issue Type: Task
Reporter: Guocheng Zhang
I think the current Broker's read and write performance still has a relatively
large room for improvement. We need to continue to iterate to improve the
storage performance of the system. I have listed some considerations and hope
to get some better suggestions:
1. Data read and write operations should consider the characteristics of the
disk, for example, the disk is based on 512-byte sectors as its storage unit,
and read data in batches of 64k; the file system will eliminate the cache
according to certain rules Pages in memory etc. If the read and write
operations take these contents into account, I believe that the current TPS can
be higher;
2. Storage should consider the problem of fragmentation of disk space, such as
pre-allocation of fixed-length files and reuse of aging files to enable
continuous reading of disk files and improve data read and write speed;
3. The number of memory cache blocks should be configurable: the current memory
cache is managed according to the fixed configuration of 2 memory blocks per
topic. We should allow the business to build more memory cache space based on
actual resource conditions;
4. More effective memory-to-disk operation: At present, the flashing operation
is to flash messages from the memory to the disk one by one for storage. This
block can be adjusted to write to the disk in batches according to the memory
block, thereby improving storage efficiency;
5. Remove the SSD auxiliary consumption function: Because the SSD disk capacity
is too small, the SSD storage consumption is not suitable for practical
applications, so it should be removed to avoid user confusion, and related
configurations and settings need to be cleaned up;
6. The stored file should increase the content of the file header, including
the data version information, in order to facilitate the subsequent storage
scheme is still seamlessly compatible with the data format of the old version;
7. Add CheckPoint check mechanism: the current system will only check the
validity of the last file when it is restarted. In fact, when the system is
shut down, there may be multiple consecutive files still in memory, the
practice that only the last file is checked currently is easy to cause abnormal
mixing into the data stream, we should add CheckPoint mechanism to improve this
abnormal situation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)