[ 
https://issues.apache.org/jira/browse/HIVE-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740329#action_12740329
 ] 

Adam Kramer commented on HIVE-431:
----------------------------------

Another note: This isn't done or represented in normal SQL at all because 
normal SQL allows for updates--so the SELECT query that generated the table's 
data could quickly become obsolete. Not so with Hive!

This is also very useful metadata to search, as it lets us cross-index tables 
to know which tables feed which other tables. This will help us detect 
aggregate tables (to avoid re-aggregation) and to identify dependencies among 
tables (so if we change a given table's contents, we will have a good guess at 
who will be affected)

> Auto-add table property "select" to be the select statement that created the 
> table
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-431
>                 URL: https://issues.apache.org/jira/browse/HIVE-431
>             Project: Hadoop Hive
>          Issue Type: Wish
>            Reporter: Adam Kramer
>
> A syntactic copy of the query that was used to fill a table would often be 
> AMAZINGLY useful for figuring out where the data in the table came from.
> I think the best way to implement this would be to automatically add a table 
> property which includes the SELECT statement. For partitioned tables, this 
> would need to exist for each partition...or perhaps use some canonical name 
> like selectquery for unpartitioned tables, plus selectquery_ds=<DATEID> for 
> partitioned tables.
> This problem is growing as more and more tables in our database are generated 
> by either "root" or by people who are no longer easy to contact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to