[jira] [Commented] (PHOENIX-541) Make mutable batch size bytes-based instead of row-based

Ravi Kishore Valeti (JIRA) Tue, 13 Oct 2015 07:20:12 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955008#comment-14955008
 ]


Ravi Kishore Valeti commented on PHOENIX-541:
---------------------------------------------

This JIRA can be extremely useful for Secondary MR based Index builds 
especially when they are running under memory constrained Containers.

Resource manager would kill the containers which overshoot  memory and 
re-trigger them at a later point - which can either affect overall job 
execution time or cause the job to fail.

We can apply dynamic batch sizing based on the available Memory for the 
Map/Reduce task so that tasks do not overshoot memory while batching.

ex: Map Max memory is set to 2 GB, Avg Row size is 2MB for a (wider) table & 
Batching is set to 1000 rows, then map task will have to keep 2GB worth 
mutations in Memory. This may lead to Resource Manager killing the task for 
overshooting memory & re-trigger later. Eventually either job might fail or 
might take a huge time to complete due to re-tries.

[~jamestaylor]

> Make mutable batch size bytes-based instead of row-based
> --------------------------------------------------------
>
>                 Key: PHOENIX-541
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-541
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 3.0-Release
>            Reporter: mujtaba
>              Labels: newbie
>
> With current configuration of row-count based mutable batch size, ideal value 
> for batch size is around 800 rather then current 15k when creating indexes 
> based on memory consumption, CPU and GC (data size: key: ~60 bytes, 14 
> integer column in separate CFs)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-541) Make mutable batch size bytes-based instead of row-based

Reply via email to