[
https://issues.apache.org/jira/browse/TUBEMQ-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guocheng Zhang updated TUBEMQ-120:
----------------------------------
Description:
1. Data read and write operations should consider the characteristics of the
disk, for example, the disk is based on 512-byte sectors as its storage unit,
and read data in batches of 64k; the file system will eliminate the cache
according to certain rules Pages in memory etc. If the read and write
operations take these contents into account, I believe that the current TPS can
be higher;
-----------------------------------
I understand the reason for this problem, the problem I think is that the data
alignment needs to be considered when storing data to disk, so that the head
can read the data with as few accesses as possible. For example, the data b in
the picture, when it unaligned stored, two sectors need to be accessed, but the
head only needs to access and read one sector after aligned storage:
!image-2020-05-15-11-28-20-118.png!
The performance difference of this problem may not be obvious under a single
access, but under the reading of massive messages, the performance will be
essentially improved from quantitative to qualitative changes, especially since
TubeMQ internally reads data randomly, it is more necessary to consider this .
After the modification, a new problem introduced based on this modification
needs our attention, that is, the storage space used after the system
modification will increase under the same data amount. As shown in the picture,
because the data is aligned, for small packet data (below the sector size ) a,
will occupy more space. However, considering that the overall performance of
the disk is unchanged, and the disk space can reach a very high capacity, this
waste is considered acceptable to me
For the optimization of this problem, do you have any suggest ? If not, I am
going to claim this modification and optimize it according to this idea.
was:
1. Data read and write operations should consider the characteristics of the
disk, for example, the disk is based on 512-byte sectors as its storage unit,
and read data in batches of 64k; the file system will eliminate the cache
according to certain rules Pages in memory etc. If the read and write
operations take these contents into account, I believe that the current TPS can
be higher;
-----------------------------------
I understand the reason for this problem, The problem I think is that the data
alignment needs to be considered when storing data to disk, so that the head
can read the data with as few accesses as possible. For example, the data b in
the picture, when it unaligned stored, two sectors need to be accessed, but the
head only needs to access and read one sector after aligned storage:
!image-2020-05-15-11-28-20-118.png!
The performance difference of this problem may not be obvious under a single
access, but under the reading of massive messages, the performance will be
essentially improved from quantitative to qualitative changes, especially since
TubeMQ internally reads data randomly, it is more necessary to consider this .
After the modification, a new problem introduced based on this modification
needs our attention, that is, the storage space used after the system
modification will increase under the same data amount. As shown in the picture,
because the data is aligned, for small packet data (below the sector size ) a,
will occupy more space. However, considering that the overall performance of
the disk is unchanged, and the disk space can reach a very high capacity, this
waste is considered acceptable to me
For the optimization of this problem, do you have any suggest ? If not, I am
going to claim this modification and optimize it according to this idea.
> Aligned disk data storage
> -------------------------
>
> Key: TUBEMQ-120
> URL: https://issues.apache.org/jira/browse/TUBEMQ-120
> Project: Apache TubeMQ
> Issue Type: Sub-task
> Reporter: Guocheng Zhang
> Assignee: Guocheng Zhang
> Priority: Major
> Fix For: 0.5.0
>
> Attachments: image-2020-05-15-11-28-20-118.png
>
>
> 1. Data read and write operations should consider the characteristics of the
> disk, for example, the disk is based on 512-byte sectors as its storage unit,
> and read data in batches of 64k; the file system will eliminate the cache
> according to certain rules Pages in memory etc. If the read and write
> operations take these contents into account, I believe that the current TPS
> can be higher;
> -----------------------------------
> I understand the reason for this problem, the problem I think is that the
> data alignment needs to be considered when storing data to disk, so that the
> head can read the data with as few accesses as possible. For example, the
> data b in the picture, when it unaligned stored, two sectors need to be
> accessed, but the head only needs to access and read one sector after aligned
> storage:
> !image-2020-05-15-11-28-20-118.png!
> The performance difference of this problem may not be obvious under a single
> access, but under the reading of massive messages, the performance will be
> essentially improved from quantitative to qualitative changes, especially
> since TubeMQ internally reads data randomly, it is more necessary to consider
> this .
> After the modification, a new problem introduced based on this modification
> needs our attention, that is, the storage space used after the system
> modification will increase under the same data amount. As shown in the
> picture, because the data is aligned, for small packet data (below the sector
> size ) a, will occupy more space. However, considering that the overall
> performance of the disk is unchanged, and the disk space can reach a very
> high capacity, this waste is considered acceptable to me
> For the optimization of this problem, do you have any suggest ? If not, I am
> going to claim this modification and optimize it according to this idea.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)