[ 
https://issues.apache.org/jira/browse/PHOENIX-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959625#comment-14959625
 ] 

Ravi Kishore Valeti edited comment on PHOENIX-2292 at 10/15/15 9:31 PM:
------------------------------------------------------------------------

Setup: 8 node cluster
Data Table: 1B rows Wide table (20 Columns)
Node Manager Max Memory Configured : 10 GB
Map Red max memory Configured: 2 GB
Input Splits: 128
Parallel Mappers run: 39
Time Taken : 720 minutes as opposed to earlier 1450 minutes.

Run had completed in 12 hrs with 3.5 hrs as avg map time.

Note: With batching on no.of rows and not on size, there is a chance of 
flooding Region Servers with too many write requests (from 39 parallel mappers) 
which will lead to Region server throwing RegionTooBusyException and the 
clients re-trying after 10 seconds (idle time!) with further failed re-tries 
with a backoff time. In which case, Job execution will get much delayed than 
usual!


was (Author: rvaleti):
Setup: 8 node cluster
Data Table: 1B rows Wide table (20 Columns)
Node Manager Max Memory Configured : 10 GB
Map Red max memory Configured: 2 GB
Input Splits: 128
Parallel Mappers run: 39

Run had completed in 12 hrs with 3.5 hrs as avg map time.

However, with batching on no.of rows and not on size, there is a chance of 
flooding Region Servers with too many write requests (from 39 parallel mappers) 
which will lead to Region server throwing RegionTooBusyException and the 
clients re-trying after 10 seconds (idle time!) with further failed re-tries 
with a backoff time. In which case, Job execution will get much delayed than 
usual!

> Improve performance of direct HBase API index build
> ---------------------------------------------------
>
>                 Key: PHOENIX-2292
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2292
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ravi Kishore Valeti
>         Attachments: PHOENIX-2292.patch
>
>
> The direct HBase API index build _should_ be almost as fast as the native 
> Phoenix index build, but we're seeing a big difference:
> |  | 100M narrow table (min) | 1B narrow table (min) | 1B wide table (min)
> | Non MR | 10 | 76 | 511
> | HFile MR | 17 | 161 | 1,375
> | Direct HBase APIs  | 24 | 84 | 1,450
> These results are for a 8 node cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to