[ 
https://issues.apache.org/jira/browse/PHOENIX-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134936#comment-14134936
 ] 

Lars Hofhansl commented on PHOENIX-36:
--------------------------------------

I'd like to pick this up again if there's interest.

This would even be interesting if we had intra-region parallelism as we'd still 
want to set the number of threads based on the number of region servers 
involved. The intra-region parallelism will solve the timeouts. The other route 
is to define the timeouts such that they cover the worst case. I.e. how long 
does it take for a single thread to scan an entire region of 20g? Since in 
HBase we usually add some RAM/CPU unit with each unit disk space we only need 
to support the worst case.

For this feature, there should be a default value (probably just 1.0) and then 
the PARALLEL(N) hint. The only problem is that this might be a bit hard to 
explain - i.e. what does PARALLEL(0.1) mean? (it means use 1 thread/scan for 
every 10 involved region servers, but will anybody understand this?)
We probably want to bound the value range. Maybe from 0.01 (very low priority 
background query) to 100 (might bind 100 handler threads on each involved 
region server).

> Parallel Scaling
> ----------------
>
>                 Key: PHOENIX-36
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-36
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>
> Right now the parallel scaling is defined by a constant (I think 32) that 
> defines the number of threads/splits that can drive a single query.
> This number might be too large for a small cluster and too small for a large 
> cluster; and this value should change as a cluster grows.
> One idea is to instead have a "scaling number". This would be a floating 
> point number define the the number of threads to use per involved 
> RegionServer.
> Say a query touches 10 RegionServers, than a scaling factor
> * of 1.0 would mean 10 threads
> * 0.1 means 1 thread
> * 10.0 means 100 thread
> * etc
> That way one can define the cost of a query in terms of cluster resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to