[ https://issues.apache.org/jira/browse/PHOENIX-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Istvan Toth updated PHOENIX-6944: --------------------------------- Summary: Randomize mapper task ordering for MR tools (was: Randomize mapper task ordering for Indexing MR tools) > Randomize mapper task ordering for MR tools > ------------------------------------------- > > Key: PHOENIX-6944 > URL: https://issues.apache.org/jira/browse/PHOENIX-6944 > Project: Phoenix > Issue Type: Improvement > Components: core > Reporter: Istvan Toth > Priority: Major > > Currently, splits are generated by PhoenixInputFormat are in ascending order. > MR does not use this ordering directly, it instead orders the partitions by > size in descending order. > We set the sizes of the splits to the region size. (Even when splitting by > guideposts, but this not really a problem) > The result is that mapper jobs are grouped by regions, so usually all the > mappers running are working on one, or few regions. As a result we have the > following problems: > Read hotspotting: > All scan operations for the indexing job hit the same one or few region > servers, causing high loads and slowdowns. > Write hotspotting: > If the data rowkeys and index rowkeys strongly correlate, then the data read > from one or few data regions will be written to one or few index regions, > causing high loads and slowdowns. This is a bit of a corner case, we have > obeserved it when building an index for a column which starts with the same > bytes as the primary key for the data table. > We can improve this by making sure that the generate mapper jobs are executed > in a random order. The only way to change the execution order is to > manipulate the length of the splits. As the length is only used for ordering, > and calculating completion percentage, this is unlikely to cause problems (we > already specify wildly off lengths when splitting by guidepost ) > I've run some test on a 50M row, 40GB data table, generating secondary > indexes for a correlated field and for a random field: > The test system has three RS workers, and 12 yarn slots for running IndexTool > ||Index rebuild time||on correlated field||on random field|| > |w/o randomization|50 min|28 min| > |w/ randomization|30 min|23 min| -- This message was sent by Atlassian Jira (v8.20.10#820010)