We are in agreement, I believe. You are correct. It's not very useful at all. ...but....that's the inferred lineage, or lineage "once established"..... ....which is garnered from the details of the things that connect other things together. In my research thus far, those "things" are defined quite nicely......and.....lineage is only the secondary goal for import of metadata from atlas for the customers I've been speaking with. Gotta first get the "things" (assets) so that they can managed, given assigned Terms, Stewards, etc. ....and like importing "Informatica" or "SQL" into IGC, the task is the same. Get the "thing" and its properties...and then get the "connections" that it makes. ...and from my research, those connections are very well defined as inputs and outputs.... edges? At this point, who cares. ...there "may" be some aspects of that that are useful, in case the lineage is not defined by the assets themselves, but thus far I have made progress without having to touch the "lineage" API calls.
As for "partnering", it seems to be a creative marriage of brain power. We have people working on Atlas, but focusing mostly on its integration with our "NGP"....but making contributions. But usually, when needing some details on Atlas, I was encouraged, and have made the most progress when sending emails directly to the Atlas dev community. There is an email list server there for "atlas <[email protected]>"....it seems to work like techlist and the participants are fairly responsive. I've had mixed success, but have managed to get myself fairly deep into Atlas on my own. I published two recordings thus far for our "governance user" (less technical) customers who are investigating Atlas.... https://www.youtube.com/watch?v=C4lf_EFduqU [overall intro to starting up the HW vm image and getting into Atlas...very basic] https://www.youtube.com/watch?v=6Us2zG-WvS8 [intro to its REST API]. Which customer are you working with on an Atlas integration? Ernie Ernie Ostic WW Product Specialist, Information Server IBM Analytics Cell: (617) 331 8238 --------------------------------------------------------------- Open IGC is here! Extend the Catalog with custom objects and lineage definitions! https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/ From: Russell Anderson/Worcester/IBM To: Ernie Ostic/Newark/IBM@IBMUS Cc: Russell Anderson/Worcester/IBM@IBMUS Date: 07/07/2016 09:56 AM Subject: Re: Expanded my Cloudera Metabroker to now process Horton Works...one issue Hi Ernie, If you mean this: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_data_governance/content/section_atlas_restapi_hivelineageresource.html I disagree. There is no clear illustration of how to order the returned results of the 'edges'. This is 'by the way' the biggest hole with both HW/Atlas/Apache and with Cloudera. I had to reverse engineer the Cloudera lineage (very well i might add) - with little to no help from Cloudera. However, at least Cloudera has an implied way to create the correct lineage ordering. That is not true with Apache/Atlas - HW. QUESTION: We are supposed to have a partnership with Horton Works - do we have people that we can speak to on this topic? Regards, Russell G. Anderson Senior Technical Consultant From: Ernie Ostic/Newark/IBM To: Russell Anderson/Worcester/IBM@IBMUS Date: 07/06/2016 10:23 PM Subject: Re: Expanded my Cloudera Metabroker to now process Horton Works...one issue ; ) no....absolutely not...lol........it's all still experimentation......but I looked at what that returns and decided......not enough detail. So I have been chasing the processes themselves....which have clear illustrations of inputs and outputs......... Ernie Ernie Ostic WW Product Specialist, Information Server IBM Analytics Cell: (617) 331 8238 --------------------------------------------------------------- Open IGC is here! Extend the Catalog with custom objects and lineage definitions! https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/ From: Russell Anderson/Worcester/IBM To: Ernie Ostic/Newark/IBM@IBMUS Cc: Russell Anderson/Worcester/IBM@IBMUS Date: 07/06/2016 03:34 PM Subject: Re: Expanded my Cloudera Metabroker to now process Horton Works...one issue Ernie, I have created a specific test case lineage whereby I am intentionally causing it to to create a new hive table from an existing table. You will find that inputs/graph and outputs/graph (if it exists) returns you the results in terms of 'edges' and 'verticies'. The edge is the flow between the entities (vertex). The issue is that JSON can never return a set of edges in any assured order. So I am now producing lineage but I have no way of determining the proper order. Second, it appears that the edges results is returning duplicated GUID's too. So you think that got it all workng yet? Regards, Russell G. Anderson Senior Technical Consultant From: Ernie Ostic/Newark/IBM To: Russell Anderson/Worcester/IBM@ibmus Cc: Russell Anderson/Worcester/IBM@ibmus Date: 07/06/2016 03:29 PM Subject: Re: Expanded my Cloudera Metabroker to now process Horton Works...one issue I will take a look later tonite....have been at a customer all day today......but in the atlas work i have been doing, i see it as a custom effort for each and every meta Type...........and then, if that meta type inspires lineage (such as with type sqoop_process), I can illustrate it. Is that what you are doing? Which Type did you begin with? .....or are you just capturing generic lineage results/graphs...... Ernie Sent from my iPad using IBM Verse On Jul 6, 2016, 3:15:09 PM, [email protected] wrote: From: [email protected] To: [email protected] Cc: [email protected] Date: Jul 6, 2016 3:15:09 PM Subject: Expanded my Cloudera Metabroker to now process Horton Works...one issue Hi Ernie, I have managed to expand my Cloudera Metabroker code to now include Horton Works Apache Atlas. I have posted the following : https://community.hortonworks.com/questions/43778/the-results-returned-from-http192168114421000apiat.html One really bad thing about JSON is you cannot assure order of your results ( see many references on internet ). So the Inputs/Graph returns "edges" and "verticies" - the problem is that the edges appear in no particular order and moreover there appear to be duplicates references to the GUID's. Do you know by what means you are supposed to glean the proper source-to-target ordering? I know you have been playing around with this but not sure if you have gotten this far with understanding the results. Let me know. Regards, Russell G. Anderson Senior Technical Consultant
