[
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836103#action_12836103
]
Zheng Shao commented on HIVE-1131:
----------------------------------
S1. Can we make lineage partition-level instead of table-level?
S2. We might want to define formally the concepts of these levels, especially
how they are composited (What will be UDAF of UDF, or UDF of UDAF, like
round(sum(col)), or sum(round(col)))
{code}
+ /**
+ * Enum to track dependency. This enum has two values:
+ * 1. SCALAR - Indicates that the column is derived from a scalar expression.
+ * 2. AGGREGATION - Indicates that the column is derived from an aggregation.
+ */
+ public static enum DependencyType {
+ SIMPLE, UDF, UDAF, UDTF, SCRIPT, SET
+ }
+
{code}
S3. Use "{}" even for single statement in "if", "for" etc.
S4. Use "ArrayList" instead of "Vector" when it's accessed by a single thread.
S5. Remove "private HashMap<FileSinkOperator, Table> fopToTable;" since it's
not used.
> Add column lineage information to the pre execution hooks
> ---------------------------------------------------------
>
> Key: HIVE-1131
> URL: https://issues.apache.org/jira/browse/HIVE-1131
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Ashish Thusoo
> Assignee: Ashish Thusoo
> Attachments: HIVE-1131.patch
>
>
> We need a mechanism to pass the lineage information of the various columns of
> a table to a pre execution hook so that applications can use that for:
> - auditing
> - dependency checking
> and many other applications.
> The proposal is to expose this through a bunch of classes to the pre
> execution hook interface to the clients and put in the necessary
> transformation logic in the optimizer to generate this information.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.