[
https://issues.apache.org/jira/browse/HAWQ-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hubert Zhang closed HAWQ-210.
-----------------------------
Resolution: Fixed
fixed
> Improve data locality by calculating the insert host.
> -----------------------------------------------------
>
> Key: HAWQ-210
> URL: https://issues.apache.org/jira/browse/HAWQ-210
> Project: Apache HAWQ
> Issue Type: Improvement
> Components: Core
> Reporter: Hubert Zhang
> Assignee: Hubert Zhang
>
> Currently, data locality is based on a heuristic greedy algotirhm.
> First consider continue blocks for a vseg and then non continue blocks and
> finally non local blocks.
> But when a file contains several continue blocks but each vseg could only
> process one blocks due to avg size. In this case continue blocks are assigned
> to different vsegs one by one, and they are to be treated as non continue
> blocks.
> In this improvement, we try to add continue infomation to help choosing the
> right vseg in non continue blocks allocation stages. The main idea is to go
> through the blocks in a file, and find the host which include the max number
> of blocks in this file. We call this host as INSERT HOST. When assigning non
> continue blocks, we prefer INSERT HOST to other hosts when they are all local
> read.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)