[
https://issues.apache.org/jira/browse/HAWQ-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037399#comment-15037399
]
ASF GitHub Bot commented on HAWQ-210:
-------------------------------------
GitHub user zhangh43 opened a pull request:
https://github.com/apache/incubator-hawq/pull/155
HAWQ-210. Improve data locality by calculating the insert host.
Currently, data locality is based on a heuristic greedy algotirhm.
First consider continue blocks for a vseg and then non continue blocks and
finally non local blocks.
But when a file contains several continue blocks but each vseg could only
process one blocks due to avg size. In this case continue blocks are assigned
to different vsegs one by one, and they are to be treated as non continue
blocks.
In this improvement, we try to add continue infomation to help choosing the
right vseg in non continue blocks allocation stages. The main idea is to go
through the blocks in a file, and find the host which include the max number of
blocks in this file. We call this host as INSERT HOST. When assigning non
continue blocks, we prefer INSERT HOST to other hosts when they are all local
read.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhangh43/incubator-hawq hawq210
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hawq/pull/155.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #155
----
commit 1039f7a59f6d28ee8a258922384ec73768a04b45
Author: hubertzhang <[email protected]>
Date: 2015-12-03T06:56:10Z
HAWQ-210. Improve data locality by calculating the insert host.
----
> Improve data locality by calculating the insert host.
> -----------------------------------------------------
>
> Key: HAWQ-210
> URL: https://issues.apache.org/jira/browse/HAWQ-210
> Project: Apache HAWQ
> Issue Type: Improvement
> Components: Core
> Reporter: Hubert Zhang
> Assignee: Hubert Zhang
>
> Currently, data locality is based on a heuristic greedy algotirhm.
> First consider continue blocks for a vseg and then non continue blocks and
> finally non local blocks.
> But when a file contains several continue blocks but each vseg could only
> process one blocks due to avg size. In this case continue blocks are assigned
> to different vsegs one by one, and they are to be treated as non continue
> blocks.
> In this improvement, we try to add continue infomation to help choosing the
> right vseg in non continue blocks allocation stages. The main idea is to go
> through the blocks in a file, and find the host which include the max number
> of blocks in this file. We call this host as INSERT HOST. When assigning non
> continue blocks, we prefer INSERT HOST to other hosts when they are all local
> read.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)