[
https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881510#comment-16881510
]
Hadoop QA commented on PHOENIX-4925:
------------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12974090/PHOENIX-4925.phoenix-stats.003.patch
against master branch at commit 3b31923d4c698ee2a38f5b9db104892a194e2131.
ATTACHMENT ID: 12974090
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 82 new
or modified tests.
{color:red}-1 patch{color}. The patch command could not apply the patch.
Console output:
https://builds.apache.org/job/PreCommit-PHOENIX-Build/2751//console
This message is automatically generated.
> Use a Variant Segment tree to organize Guide Post Info
> ------------------------------------------------------
>
> Key: PHOENIX-4925
> URL: https://issues.apache.org/jira/browse/PHOENIX-4925
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Bin Shi
> Assignee: Bin Shi
> Priority: Major
> Attachments: PHOENIX-4925.phoenix-stats.003.patch,
> PHOENIX-4925.phoenix-stats.0502.patch, PHOENIX-4925.phoenix-stats.0510.patch
>
> Time Spent: 13h 40m
> Remaining Estimate: 0h
>
> As reported, Query compilation (for the sample queries showed below),
> especially deriving estimation and generating parallel scans from guide
> posts, becomes much slower after we introduced Phoenix Stats.
> a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c
> b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY
> Pk1__c
> c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY
> pk1__c,pk2__c
> d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c
> ORDER BY pk1__c,pk2__c
> e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <=
> 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only
> column to make the primary key.
>
> By using prefix encoding for guide post info, we have to decode and traverse
> guide posts sequentially, which causes time complexity in
> BaseResultIterators.getParallelScan(...) to be O( n ) , where n is the total
> count of guide posts.
> According to PHOENIX-2417, to reduce footprint in client cache and over
> transmition, the prefix encoding is used as in-memory and over-the-wire
> encoding for guide post info.
> We can use Segment Tree to address both memory and performance concerns. The
> guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by
> prefix encoding and the encoded data is a leaf node of the tree. The inner
> node contains summary info (the count of rows, the data size) of the sub tree
> rooted at the inner node.
> With this tree like data structure, compared to the current data structure,
> the increased size (mainly coming from the n/k-1 inner nodes) is ignorable.
> The time complexity for queries a, b, c can be reduced to O(m) where m is the
> total count of regions; the time complexity for "EXPLAN" queries a, b, c can
> be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can
> even be reduced to O(1). For queries d and e, the time complexity to find the
> start of target scan ranges can be reduced to O(log(n/k)).
> The tree can also integrate AVL and B+ characteristics to support partial
> load/unload when interacting with stats client cache.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)