[
https://issues.apache.org/jira/browse/ASTERIXDB-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247770#comment-16247770
]
ASF subversion and git services commented on ASTERIXDB-2115:
------------------------------------------------------------
Commit 39390edc9d9a6a95fd312acf63fee9801c17a98b in asterixdb's branch
refs/heads/master from [~luochen01]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=39390ed ]
[ASTERIXDB-2115] Add Component Ids to LSM Indexes
- user model changes: no
- storage format changes: no
- interface changes: yes
Details:
- Add LSMComponentId to all LSM components. Component Ids are managed
through IO operation callbacks.
- For memory component, it's ID is reset every time it's recycled.
- For disk component, it's ID is copied from the source component(s)
during flush/merge
- For indexes of a dataset, we need to guarantee all their memory
components should recieve the same ID. This is achieved using a shared
component Id generator.
- Fix memory component recycled callback, make sure it's called only
when we've indeed recycled the memory component
A design wiki for this patch: https://cwiki.apache.org/confluence/display/
ASTERIXDB/Component+Id-based+secondary-to-primary+index+acceleration
Change-Id: I8aec6261a84a0729ce35f4b1cb708be299ddb98d
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2025
Sonar-Qube: Jenkins <[email protected]>
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: abdullah alamoudi <[email protected]>
Contrib: Jenkins <[email protected]>
> Component Id-based secondary to primary acceleration
> ----------------------------------------------------
>
> Key: ASTERIXDB-2115
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2115
> Project: Apache AsterixDB
> Issue Type: Improvement
> Reporter: Chen Luo
> Assignee: Chen Luo
>
> Previously, after we get a list of pkeys of secondary index, we perform point
> lookups against the primary index to fetch the records. When the number of
> disk components is large, we need to perform a lot of unnecessary searches
> because of false positives of bloom filters. However, since the memory
> components of all indexes are always flushed together, we can narrow down the
> candidate components of the primary index based on the component of the
> secondary index where the pkey is found.
> To enable this optimization, we first need to assign a unique Id to all
> components (including disk and memory), and guarantee all memory components
> of a dataset (partition) receive the same id upon creation. These component
> Ids are propagated to the primary index during query processing to facilitate
> primary index lookups.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)