[ 
https://issues.apache.org/jira/browse/HIVE-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702981#action_12702981
 ] 

Jeff Hammerbacher commented on HIVE-449:
----------------------------------------

https://issues.apache.org/jira/browse/HIVE-29 would be another, potentially 
more elegant, approach to this problem.

> Automatic memoization of intermediate data tables
> -------------------------------------------------
>
>                 Key: HIVE-449
>                 URL: https://issues.apache.org/jira/browse/HIVE-449
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Venky Iyer
>
> Processing data with Hive encourages you to specify your data transformation 
> in the form of fairly complex nested joins/cluster bys/group bys etc, 
> supplementing functionality with custom transforms where necessary. This 
> however has the disadvantage that it's hard to inspect the output of 
> intermediate phases; it's also an inconvenience when your custom TRANSFORM 
> script at the end of a long chain of mapreduce jobs fails with syntax 
> errors/bugs -- because now you need to run all the previous steps before you 
> can check if you fixed the bugs in the custom script. This can be alleviated 
> by providing functionality to capture specific steps in intermediate tables 
> automatically,  allowing me to be expressive in HiveQL without having to 
> bookkeep all the intermediate tables. 
> You may need a way to name queries and phases, so that you have a way of 
> identifying which intermediate tables belong to which queries' phases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to