[jira] [Commented] (PHOENIX-3560) Aggregate query performance is worse with encoded columns for schema with large number of columns

Samarth Jain (JIRA) Tue, 10 Jan 2017 18:30:07 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816901#comment-15816901
 ]


Samarth Jain commented on PHOENIX-3560:
---------------------------------------

We use a SINGLE_KEYVALUE_COLUMN_QUALIFIER "1" which is sorted after our empty 
key value column 0 ( I should probably change it use the Integer representation 
of 1). 

[~mujtabachohan] and I tested this out offline. And it turned that that 
increasing the block cache size helped speed up the performance of the query. 
It runs 2x faster than against non-encoded immutable table. 

[~lhofhansl] pointed out that because HBase automatically increases the block 
size to fit in a key value with the default block size being 64K. He mentioned 
that what likely is happening in this case is that the "empty" key value and 
the packed key value both end up on the block whose size is much larger than 
64K. As a result, we are not able to really take advantage of the first key 
only filter since we always have to read this entire large block before we 
could skip to the next row.

> Aggregate query performance is worse with encoded columns for schema with 
> large number of columns
> -------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3560
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3560
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Mujtaba Chohan
>            Assignee: Thomas D'Silva
>             Fix For: 4.10.0
>
>         Attachments: DataGenerator.java, PHOENIX-3565.patch
>
>
> Schema with 5K columns
> {noformat}
> create table (k1 integer, k2 integer, c1 varchar ... c5000 varchar CONSTRAINT 
> PK PRIMARY KEY (K1, K2)) 
> VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=true
> {noformat}
> In this test, there are no null columns and each column contains 200 chars 
> i.e. 1MB of data per row.
> Count * aggregation is about 5X slower with encoded columns when compared to 
> table non-encoded columns using the same schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-3560) Aggregate query performance is worse with encoded columns for schema with large number of columns

Reply via email to