[
https://issues.apache.org/jira/browse/TUBEMQ-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Zhou updated TUBEMQ-124:
-----------------------------
Attachment: s15917036711158.png
> Structured index storage
> ------------------------
>
> Key: TUBEMQ-124
> URL: https://issues.apache.org/jira/browse/TUBEMQ-124
> Project: Apache TubeMQ
> Issue Type: Sub-task
> Reporter: Guocheng Zhang
> Assignee: Jeff Zhou
> Priority: Major
> Attachments: s15917036711158.png, screenshot-1.png
>
>
> 1. Structured index storage: optimize the current index storage, for example,
> increase the structured index storage, which can be quickly retrieved through
> the index when in use to quickly locate the data; the increase in the index
> structure may make the write request slower, At the same time, it takes more
> time to check and restore the index when the system restarts
> --------------------------------------------------------------------------
> To solve this problem, I plan to implement it like this:
> !screenshot-1.png!
> The first add 2 bytes of version information at the end of the segment file,
> then, divide the datas to bucket in index segment file, and use the Bloom
> filter algorithm to save the position of the filter item for each data in the
> bucket. After this improvement, there are level 2 indexes in the index
> segment file, the Bloom filter bitmap is the first level, and the index
> bucket with message index information is the second level.
> When filtering consumption, the system first searches whether the filter item
> exists in the corresponding data bucket from the first level. If it does not
> exist, it continues to search for the existence of the next data bucket until
> the index segment file is completed and the filter is switched to the next
> index segment file; if the filter item is in a data bucket, the data in the
> corresponding data bucket will be read according to the current index file
> retrieval method.
> Implementation effect estimation: The results of using the Bloom filter
> algorithm to locate the results are not guaranteed to be unique, but they
> should be improved compared to the current item-by-item inspection, at least
> in the worst case, the filtering effect is consistent; and it will be a very
> good help if the sparse and non-colliding index item collection. The impact
> is that we need additional index storage space, and index file recovery
> requires special attention.
> If the design needs to be implemented, I think the following points need to
> be considered:
> 1. Due to the addition of a bitmap index, the checkpoint file needs to be
> added to the index store, so, when the system is restarted we can know the
> starting checkpoint of the index file;
> 2. Due to the change in file structure, before releasing the version of this
> feature, we need to first release a historical version compatible with this
> feature to solve the system rollback problem after this feature version is
> upgraded abnormally. I think that this is a one-time operation, the price is
> worth it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)