dockerzhang opened a new issue #723:
URL: https://github.com/apache/incubator-inlong/issues/723
<p>1. Structured index storage: optimize the current index storage, for
example, increase the structured index storage, which can be quickly retrieved
through the index when in use to quickly locate the data; the increase in the
index structure may make the write request slower, At the same time, it takes
more time to check and restore the index when the system restarts</p>
<p>--------------------------------------------------------------------------</p>
<p>To solve this problem, I plan to implement it like this:<br/>
<span class="image-wrap" style=""><img
src="https://issues.apache.org/jira/secure/attachment/13003167/13003167_screenshot-1.png"
style="border: 0px solid black" /></span> </p>
<p>The first add 2 bytes of version information at the end of the segment
file, then, divide the datas to bucket in index segment file, and use the Bloom
filter algorithm to save the position of the filter item for each data in the
bucket. After this improvement, there are level 2 indexes in the index segment
file, the Bloom filter bitmap is the first level, and the index bucket with
message index information is the second level. </p>
<p>When filtering consumption, the system first searches whether the filter
item exists in the corresponding data bucket from the first level. If it does
not exist, it continues to search for the existence of the next data bucket
until the index segment file is completed and the filter is switched to the
next index segment file; if the filter item is in a data bucket, the data in
the corresponding data bucket will be read according to the current index file
retrieval method. </p>
<p>Implementation effect estimation: The results of using the Bloom filter
algorithm to locate the results are not guaranteed to be unique, but they
should be improved compared to the current item-by-item inspection, at least in
the worst case, the filtering effect is consistent; and it will be a very good
help if the sparse and non-colliding index item collection. The impact is that
we need additional index storage space, and index file recovery requires
special attention.</p>
<p>If the design needs to be implemented, I think the following points need
to be considered:<br/>
1. Due to the addition of a bitmap index, the checkpoint file needs to be
added to the index store, so, when the system is restarted we can know the
starting checkpoint of the index file;<br/>
2. Due to the change in file structure, before releasing the version of this
feature, we need to first release a historical version compatible with this
feature to solve the system rollback problem after this feature version is
upgraded abnormally. I think that this is a one-time operation, the price is
worth it.</p>
<i>JIRA link - <a
href="https://issues.apache.org/jira/browse/INLONG-124">[INLONG-124]</a>
created by gosonzhang</i>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]