Hubert Zhang created HAWQ-210:
---------------------------------
Summary: Improve data locality when table size is small
Key: HAWQ-210
URL: https://issues.apache.org/jira/browse/HAWQ-210
Project: Apache HAWQ
Issue Type: Improvement
Components: Core
Reporter: Hubert Zhang
Assignee: Lei Chang
Currently, data locality is based on a heuristic greedy algotirhm.
First consider continue blocks for a vseg and then non continue blocks and
finally non local blocks.
But when a file contains several continue blocks but each vseg could only
process one blocks due to avg size. In this case continue blocks are assigned
to different vsegs one by one, and they are to be treated as non continue
blocks.
In this improvement, we try to add continue infomation to help choosing the
right vseg in non continue blocks allocation stages. The main idea is to go
through the blocks in a file, and find the host which include the max number of
blocks in this file. We call this host as INSERT HOST. When assigning non
continue blocks, we prefer INSERT HOST to other hosts when they are all local
read.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)