> On June 20, 2016, 7:25 a.m., Hemanth Yamijala wrote: > > addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java, > > line 625 > > <https://reviews.apache.org/r/48939/diff/2/?file=1423788#file1423788line625> > > > > This may be a non-issue, but previously, we were passing two > > independent sets for source & target datasets (as opposed to a single > > dataSetsProcessed parameter now, which is common between source & target. > > The impact is that if a dataset is present in both input and output > > (impossible - hence non-issue?) - this would get captured only once. > > Further, I see that dataSetsProcessed is not used in the calling function. > > Hence, consider making it local to this function? > > Hemanth Yamijala wrote: > Regarding the latter part of my comment about dataSetsProcessed not being > used in the calling function - please ignore that, as I forgot about the > outer loop in the calling function.
Yes, should be a non-issue since inputs and outputs having same dataset is not possible - Suma ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48939/#review138534 ----------------------------------------------------------- On June 20, 2016, 4 a.m., Suma Shivaprasad wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48939/ > ----------------------------------------------------------- > > (Updated June 20, 2016, 4 a.m.) > > > Review request for atlas, Shwetha GS and Hemanth Yamijala. > > > Bugs: ATLAS-904 > https://issues.apache.org/jira/browse/ATLAS-904 > > > Repository: atlas > > > Description > ------- > > 1. Process qualified name = HiveOperation.name + sorted inputs + sorted > outputs > 2. HiveOperation.name doesnt provide identifiers for identiifying INSERT, > INSERT_OVERWRITE, UPDATE, DELETE etc separately . Hence adding > WriteEntity.WriteType as well which exhibits the following behaviour > a. If there are multiple outputs, for each output, adds the query > type(WriteType) > b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION], > WriteType is INSERT/INSERT_OVERWRITE > b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as > PATH_WRITE > c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - > linage is not available for this since this is single table operation] > 3.When input is of type local dir or hdfs path currently, it doesnt add it to > qualified name. The reason is that partition based paths cause a lot of > processes to be created in this case instead of updating the same process. > Pending: > Address Shwetha G S suggestion to add hdfs paths to process qualified name > only in case of non-partition based queries. This needs to be done per > HiveOperation type > 1. if HiveOperation = LOAD, IMPORT, EXPORT - detect if the current query > context is dealing with partitions and do not add if it is partition based. > 2. If HiveOperation = INSERT OVERWRITE DFS_PATH/LOCAL_PATH , then detect if > the query context is dealing with a partitioned table in inputs and decide if > we need to add or not. > > > Diffs > ----- > > > addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java > c956a32 > addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java > 23c82df > addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java > e7fbf71 > webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java > 0713d30 > > Diff: https://reviews.apache.org/r/48939/diff/ > > > Testing > ------- > > Existing tests modified to query with new qualified name. Need to add tests > for INSERT INTO TABLE > > > Thanks, > > Suma Shivaprasad > >
