[
https://issues.apache.org/jira/browse/PHOENIX-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959625#comment-14959625
]
Ravi Kishore Valeti edited comment on PHOENIX-2292 at 10/15/15 9:31 PM:
------------------------------------------------------------------------
Setup: 8 node cluster
Data Table: 1B rows Wide table (20 Columns)
Node Manager Max Memory Configured : 10 GB
Map Red max memory Configured: 2 GB
Input Splits: 128
Parallel Mappers run: 39
Time Taken : 720 minutes as opposed to earlier 1450 minutes.
Run had completed in 12 hrs with 3.5 hrs as avg map time.
Note: With batching on no.of rows and not on size, there is a chance of
flooding Region Servers with too many write requests (from 39 parallel mappers)
which will lead to Region server throwing RegionTooBusyException and the
clients re-trying after 10 seconds (idle time!) with further failed re-tries
with a backoff time. In which case, Job execution will get much delayed than
usual!
was (Author: rvaleti):
Setup: 8 node cluster
Data Table: 1B rows Wide table (20 Columns)
Node Manager Max Memory Configured : 10 GB
Map Red max memory Configured: 2 GB
Input Splits: 128
Parallel Mappers run: 39
Run had completed in 12 hrs with 3.5 hrs as avg map time.
However, with batching on no.of rows and not on size, there is a chance of
flooding Region Servers with too many write requests (from 39 parallel mappers)
which will lead to Region server throwing RegionTooBusyException and the
clients re-trying after 10 seconds (idle time!) with further failed re-tries
with a backoff time. In which case, Job execution will get much delayed than
usual!
> Improve performance of direct HBase API index build
> ---------------------------------------------------
>
> Key: PHOENIX-2292
> URL: https://issues.apache.org/jira/browse/PHOENIX-2292
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: Ravi Kishore Valeti
> Attachments: PHOENIX-2292.patch
>
>
> The direct HBase API index build _should_ be almost as fast as the native
> Phoenix index build, but we're seeing a big difference:
> | | 100M narrow table (min) | 1B narrow table (min) | 1B wide table (min)
> | Non MR | 10 | 76 | 511
> | HFile MR | 17 | 161 | 1,375
> | Direct HBase APIs | 24 | 84 | 1,450
> These results are for a 8 node cluster.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)