[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836103#action_12836103 ]
Zheng Shao commented on HIVE-1131: ---------------------------------- S1. Can we make lineage partition-level instead of table-level? S2. We might want to define formally the concepts of these levels, especially how they are composited (What will be UDAF of UDF, or UDF of UDAF, like round(sum(col)), or sum(round(col))) {code} + /** + * Enum to track dependency. This enum has two values: + * 1. SCALAR - Indicates that the column is derived from a scalar expression. + * 2. AGGREGATION - Indicates that the column is derived from an aggregation. + */ + public static enum DependencyType { + SIMPLE, UDF, UDAF, UDTF, SCRIPT, SET + } + {code} S3. Use "{}" even for single statement in "if", "for" etc. S4. Use "ArrayList" instead of "Vector" when it's accessed by a single thread. S5. Remove "private HashMap<FileSinkOperator, Table> fopToTable;" since it's not used. > Add column lineage information to the pre execution hooks > --------------------------------------------------------- > > Key: HIVE-1131 > URL: https://issues.apache.org/jira/browse/HIVE-1131 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Ashish Thusoo > Assignee: Ashish Thusoo > Attachments: HIVE-1131.patch > > > We need a mechanism to pass the lineage information of the various columns of > a table to a pre execution hook so that applications can use that for: > - auditing > - dependency checking > and many other applications. > The proposal is to expose this through a bunch of classes to the pre > execution hook interface to the clients and put in the necessary > transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.