[ 
https://issues.apache.org/jira/browse/LENS-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743967#comment-15743967
 ] 

Puneet Gupta commented on LENS-1381:
------------------------------------

Design for this requirement :

*Current rewrite flow*
- Currently the rewrite flow relies on Set<CandidateFact> and 
Set<Set<CandidateFact>> which represents the participating Facts and 
combination of Facts(in case of join between 2 or more facts) that can answer 
the user query respectively.  

- Set<CandidateFact> is initially populated considering all the Facts will 
participate and the Set<Set<CandidateFact>> is created based on joins that are 
required to answer the query (with assumption that two two facts can be joined 
if they have the dimensions that are being queried by the user. After joining 
the facts, the queried measures which are split across facts are picked). Along 
the rewrite flow the above data structures are pruned based on column 
availability, data availability, storage validity, fact validity, cost,etc. In 
the last a final CandidateFact combination is picked from  
Set<Set<CandidateFact>>. 

- To write the rewritten query for the picked candidate combination, one of the 
following contexts are created 
-- SingleFactSingleStorageHQLContext or (Candidate combination has single fact 
and single storage)
-- SingleFactMultiStorageHQLContext or (Candidate combination has single fact 
and multiple storages within that fact  - Union Query)
-- MultiFactHQLContext  (Candidate combination has multiple facts - Join Query)

*New Flow*
# The new flow will work at Storage level and will use a list of 
StorageCandidates. Initially all Storages are candidates. 
# The list of StorageCandidates is pruned based on column availability, storage 
validity, fact validity, update period validity,etc
# The StorageCandidates are then grouped to ensure that a group can cover the 
entire time range queried by the user. Its possible for a group to have a 
single StorageCandidate incase this storage alone can fulfill the time ranges 
queried. If a group has more that one storages , then this group is represented 
as a UnionCandidate. 
# The groups created in step 3 ( UnionCandidates and StorageCandidates) are 
used to find a measure covering group such that members of this group cover all 
the measures queried by the user. Again its possible for this group to have a 
single member (which can be a StorageCandidate or a UnionCandidate) that can 
answer all the measures. If the group has more than one members, then  that 
group is represented as a JoinCandidate
# JoinCandidate, UnionCandidate and  StorageCandidate extend the same Candidate 
Interface. 
# The groups created in step 4 are further pruned based on data availability, 
cost ,etc  we pick a winning group (Candidate) 
# Query is then written for this winning Candidate


 



> Support Fact to Fact Union
> --------------------------
>
>                 Key: LENS-1381
>                 URL: https://issues.apache.org/jira/browse/LENS-1381
>             Project: Apache Lens
>          Issue Type: New Feature
>            Reporter: Puneet Gupta
>
> Currently Lens supports Union-ing data across different storages in a single 
> Fact. With this JIRA Lens server will be able to Union Data Across Facts too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to