[ 
https://issues.apache.org/jira/browse/PHOENIX-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147140#comment-14147140
 ] 

Lars Hofhansl edited comment on PHOENIX-1281 at 9/25/14 12:10 AM:
------------------------------------------------------------------

Ran some queries over a table with 5 integer columns (one of which is the key). 
With this patch, PHOENIX-1280, and HBASE-12077 (which is now in 0.98). All 
columns except the key are set to 1 in every row.

All times in seconds. 40m rows:

||query|| w/ patched || w/o patches||
|select count(\*) from x| 8.8 | 8.8|
|select distinct(v1) from x| 13 | 15|
|select distinct(v1) from x where v1 = 1| 21 | 26|
|select v1 from x where v1 <> 1| 50 | >60 *|
|select distinct(v1) from x where v1 <> 1| 50 | >60 *|
|select count(\*) from x where v1 <> 1| 45 | 50|
|select count(\*) from x where v1 = 1| 9.5 | 10.8|
\* timed out

The asynchronous background work of the GC is a bit hard to measure in just a 
few runs, so I also measured the number of objects created for a running 
{{select distinct(v1) from x where v1 = 1}} of 5m rows over the same table.

W/O the patches we create:
* 5m immutable lists
* 60m ImmutableBytePtr
* 111m ArrayList$Itrs

W/ the patch we create
* 0 immutable lists (PHOENIX-1281, this issue)
* 5m ImmutableBytePtr (PHOENIX-1280)
* 0 ArrayList$Itrs (HBASE-12077)

So we saved the creation of *171m* objects, just a for 5m row run or ~34 
objects for each row scanned.
These are objects created on the region server.

[~giacomotaylor], [~apurtell], FYI.



was (Author: lhofhansl):
Ran some queries over a table with 5 integer columns (one of which is the key). 
With this patch, PHOENIX-1280, and HBASE-12077 (which is now in 0.98). All 
columns except the key are set to 1 in every row.

All times in seconds:

||query|| w/ patched || w/o patches||
|select count(\*) from x| 8.8 | 8.8|
|select distinct(v1) from x| 13 | 15|
|select distinct(v1) from x where v1 = 1| 21 | 26|
|select v1 from x where v1 <> 1| 50 | >60 *|
|select distinct(v1) from x where v1 <> 1| 50 | >60 *|
|select count(\*) from x where v1 <> 1| 45 | 50|
|select count(\*) from x where v1 = 1| 9.5 | 10.8|
\* timed out

The asynchronous background work of the GC is a bit hard to measure in just a 
few runs, so I also measured the number of objects created for a running 
{{select distinct(v1) from x where v1 = 1}} of 5m rows over the same table.

W/O the patches we create:
* 5m immutable lists
* 60m ImmutableBytePtr
* 111m ArrayList$Itrs

W/ the patch we create
* 0 immutable lists (PHOENIX-1281, this issue)
* 5m ImmutableBytePtr (PHOENIX-1280)
* 0 ArrayList$Itrs (HBASE-12077)

So we saved the creation of *171m* objects, just a for 5m row run or ~34 
objects for each row scanned.
These are objects created on the region server.

[~giacomotaylor], [~apurtell], FYI.


> Each MultiKeyValueTuple.setKeyValues creates a new immutable list object
> ------------------------------------------------------------------------
>
>                 Key: PHOENIX-1281
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1281
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 1281-v1.txt
>
>
> I looked through all callers of this method, and in each case we have a fresh 
> List object anyway, and hence the wrapping is not necessary saving at least 
> one new object per row scanned.
> This is probably not really critical, but of a sizable COUNT(\*) or other 
> aggregate it still creates lot of unnecessary objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to