[ 
https://issues.apache.org/jira/browse/PHOENIX-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth reassigned PHOENIX-6944:
------------------------------------

    Assignee: Istvan Toth

> Randomize mapper task ordering for MR tools
> -------------------------------------------
>
>                 Key: PHOENIX-6944
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6944
>             Project: Phoenix
>          Issue Type: Improvement
>          Components: core
>            Reporter: Istvan Toth
>            Assignee: Istvan Toth
>            Priority: Major
>              Labels: perf
>
> Currently, splits are generated by PhoenixInputFormat are in ascending order.
> MR does not use this ordering directly, it instead orders the partitions by 
> size in descending order.
> We set the sizes of the splits to the region size. (Even when splitting by 
> guideposts, but this not really a problem)
> The result is that mapper jobs are grouped by regions, so usually all the 
> mappers running are working on one, or few regions. As a result we have the 
> following problems:
> Read hotspotting:
> All scan operations for the indexing job hit the same one or few region 
> servers, causing high loads and slowdowns.
> Write hotspotting:
> If the data rowkeys and index rowkeys strongly correlate, then the data read 
> from one or few data regions will be written to one or few index regions, 
> causing high loads and slowdowns. This is a bit of a corner case, we have 
> obeserved it when building an index for a column which starts with the same 
> bytes as the primary key for the data table.
> We can improve this by making sure that the generate mapper jobs are executed 
> in a random order. The only way to change the execution order is to 
> manipulate the length of the splits. As the length is only used for ordering, 
> and calculating completion percentage, this is unlikely to cause problems (we 
> already specify wildly off lengths when splitting by guidepost )
> I've run some test on a 50M row, 40GB data table, generating secondary 
> indexes for a correlated field and for a random field:
> The test system has three RS workers, and 12 yarn slots for running IndexTool
> ||Index rebuild time||on correlated field||on random field||
> |w/o randomization|50 min|28 min|
> |w/ randomization|30 min|23 min|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to