[
https://issues.apache.org/jira/browse/PHOENIX-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Istvan Toth updated PHOENIX-6944:
---------------------------------
Summary: Randomize mapper task ordering for Index MR tools (was: Randomize
mapper task ordering for MR tools)
> Randomize mapper task ordering for Index MR tools
> -------------------------------------------------
>
> Key: PHOENIX-6944
> URL: https://issues.apache.org/jira/browse/PHOENIX-6944
> Project: Phoenix
> Issue Type: Improvement
> Components: core
> Reporter: Istvan Toth
> Assignee: Istvan Toth
> Priority: Major
> Labels: perf
>
> Currently, splits are generated by PhoenixInputFormat are in ascending order.
> MR does not use this ordering directly, it instead orders the partitions by
> size in descending order.
> We set the sizes of the splits to the region size. (Even when splitting by
> guideposts, but this not really a problem)
> The result is that mapper jobs are grouped by regions, so usually all the
> mappers running are working on one, or few regions. As a result we have the
> following problems:
> Read hotspotting:
> All scan operations for the indexing job hit the same one or few region
> servers, causing high loads and slowdowns.
> Write hotspotting:
> If the data rowkeys and index rowkeys strongly correlate, then the data read
> from one or few data regions will be written to one or few index regions,
> causing high loads and slowdowns. This is a bit of a corner case, we have
> obeserved it when building an index for a column which starts with the same
> bytes as the primary key for the data table.
> We can improve this by making sure that the generate mapper jobs are executed
> in a random order. The only way to change the execution order is to
> manipulate the length of the splits. As the length is only used for ordering,
> and calculating completion percentage, this is unlikely to cause problems (we
> already specify wildly off lengths when splitting by guidepost )
> I've run some test on a 50M row, 40GB data table, generating secondary
> indexes for a correlated field and for a random field:
> The test system has three RS workers, and 12 yarn slots for running IndexTool
> ||Index rebuild time||on correlated field||on random field||
> |w/o randomization|50 min|28 min|
> |w/ randomization|30 min|23 min|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)