[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247770#comment-16247770
 ] 

ASF subversion and git services commented on ASTERIXDB-2115:
------------------------------------------------------------

Commit 39390edc9d9a6a95fd312acf63fee9801c17a98b in asterixdb's branch 
refs/heads/master from [~luochen01]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=39390ed ]

[ASTERIXDB-2115] Add Component Ids to LSM Indexes

- user model changes: no
- storage format changes: no
- interface changes: yes

Details:
- Add LSMComponentId to all LSM components. Component Ids are managed
through IO operation callbacks.
- For memory component, it's ID is reset every time it's recycled.
- For disk component, it's ID is copied from the source component(s)
during flush/merge
- For indexes of a dataset, we need to guarantee all their memory
components should recieve the same ID. This is achieved using a shared
component Id generator.
- Fix memory component recycled callback, make sure it's called only
when we've indeed recycled the memory component

A design wiki for this patch: https://cwiki.apache.org/confluence/display/
ASTERIXDB/Component+Id-based+secondary-to-primary+index+acceleration

Change-Id: I8aec6261a84a0729ce35f4b1cb708be299ddb98d
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2025
Sonar-Qube: Jenkins <[email protected]>
Integration-Tests: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Reviewed-by: abdullah alamoudi <[email protected]>
Contrib: Jenkins <[email protected]>


> Component Id-based secondary to primary acceleration
> ----------------------------------------------------
>
>                 Key: ASTERIXDB-2115
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2115
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>            Reporter: Chen Luo
>            Assignee: Chen Luo
>
> Previously, after we get a list of pkeys of secondary index, we perform point 
> lookups against the primary index to fetch the records. When the number of 
> disk components is large, we need to perform a lot of unnecessary searches 
> because of false positives of bloom filters. However, since the memory 
> components of all indexes are always flushed together, we can narrow down the 
> candidate components of the primary index based on the component of the 
> secondary index where the pkey is found.
> To enable this optimization, we first need to assign a unique Id to all 
> components (including disk and memory), and guarantee all memory components 
> of a dataset (partition) receive the same id upon creation. These component 
> Ids are propagated to the primary index during query processing to facilitate 
> primary index lookups.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to