[jira] Commented: (HIVE-1941) support explicit view partitioning

He Yongqiang (JIRA) Fri, 11 Feb 2011 17:04:21 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993783#comment-12993783
 ]


He Yongqiang commented on HIVE-1941:
------------------------------------

How about the question raised in our internal review:
the relation between the view's partition key and the underlying tables' 
partition keys?

For example:

Is there a use case to support a view def like:

CREATE VIEW vp1
PARTITIONED ON (value)
AS
SELECT key, value
FROM src
WHERE key=86;

Because the view is partitioned on value, which is not the partition key of the 
underlying table. So the Partition on the view seems not very useful.

But if in an example:

CREATE VIEW vp2
PARTITIONED ON (ds)
AS
SELECT src.key, srcpart.value, srcpart.ds
FROM src
join srcpart 
on src.key = srcpart.key
;
The vp2's ds originate from srcpart. So if there is a partition ds='2011' 
existing in srcpart, it makes sense to add a partition ds='2011' in vp2.

Another example:

create table srcpart_1 (key int, value string) partitioned by (ds string);
create table srcpart_2 (key int, value string) partitioned by (ds string, hr 
int);

CREATE VIEW vp2
PARTITIONED ON (ds, hr)
AS
SELECT srcpart_1.key, srcpart_2.value, srcpart_1.ds, srcpart_2.hr
FROM srcpart_1
join srcpart_2 
on srcpart_1.key = srcpart_2.key
and srcpart_1.ds=srcpart_2.ds
;

>From this case the ds originate from srcpart_1, but the join condition put the 
>same ds on srcpart_2. So a query like "select * from vp2 where ds='2000' and 
>hr=11" is very smooth to run.



Some negative examples:

create table srcpart_1 (key int, value string) partitioned by (ds string);
create table srcpart_2 (key int, value string) partitioned by (ds string, hr 
int);

CREATE VIEW vp2
PARTITIONED ON (ds, hr)
AS
SELECT srcpart_1.key, srcpart_2.value, srcpart_1.ds, srcpart_2.hr
FROM srcpart_1
join srcpart_2 
on srcpart_1.key = srcpart_2.key
;
We should not support this because it is hard to a partition pruning.


Another one:
CREATE VIEW vp2
PARTITIONED ON (ds, hr)
AS
SELECT srcpart_1.key, srcpart_2.value, srcpart_1.ds, srcpart_2.hr
FROM srcpart_1
join srcpart_2 
on srcpart_1.key = srcpart_2.key
and srcpart_1.ds=srcpart_2.hr
;

we should probably also throw an error this definition.


I am not sure. But i think it makes sense to include the view's partition keys 
are from the underlying tables. Even not, we should make sure a prefix of the 
partition keys are from underlying tables. 
And for partition keys from base tables, we should also make sure they cover a 
prefix of the base tables' partition keys (not randomly chosen).

A big problem of this implicit partition relationship between a partitioned 
view and the base tables is how to manage the relationships to keep consistent. 
For example, what if the base table's partition got dropped and the view's 
partition is still there? I think we should be fine here.

> support explicit view partitioning
> ----------------------------------
>
>                 Key: HIVE-1941
>                 URL: https://issues.apache.org/jira/browse/HIVE-1941
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>         Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, 
> HIVE-1941.4.patch
>
>
> Allow creation of a view with an explicit partitioning definition, and 
> support ALTER VIEW ADD/DROP PARTITION for instantiating partitions.
> For more information, see
> http://wiki.apache.org/hadoop/Hive/PartitionedViews

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1941) support explicit view partitioning

Reply via email to