We are in agreement, I believe.    You are correct.   It's not very useful
at all.    ...but....that's the inferred lineage, or lineage "once
established".....   ....which is garnered from the details of the things
that connect other things together.   In my research thus far, those
"things" are defined quite nicely......and.....lineage is only the
secondary goal for import of metadata from atlas for the customers I've
been speaking with.    Gotta first get the "things" (assets) so that they
can managed, given assigned Terms, Stewards, etc.    ....and like importing
"Informatica" or "SQL" into IGC, the task is the same.  Get the "thing" and
its properties...and then get the "connections" that it makes.  ...and from
my research, those connections are very well defined as inputs and
outputs....  edges?   At this point, who cares.     ...there "may" be some
aspects of that that are useful, in case the lineage is not defined by the
assets themselves, but thus far I have made progress without having to
touch the "lineage" API calls.

As for "partnering", it seems to be a creative marriage of brain power.  We
have people working on Atlas, but focusing mostly on its integration with
our "NGP"....but making contributions.   But usually, when needing some
details on Atlas, I was encouraged, and have made the most progress when
sending emails directly to the Atlas dev community.     There is an email
list server there for "atlas <[email protected]>"....it seems
to work like techlist and the participants are fairly responsive.

I've had mixed success, but have managed to get myself fairly deep into
Atlas on my own.  I published two recordings thus far for our "governance
user" (less technical) customers who are investigating Atlas....

https://www.youtube.com/watch?v=C4lf_EFduqU   [overall intro to starting up
the HW vm image and getting into Atlas...very basic]

https://www.youtube.com/watch?v=6Us2zG-WvS8  [intro to its REST API].

Which customer are you working with on an Atlas integration?

Ernie





Ernie Ostic

WW Product Specialist, Information Server
IBM Analytics
Cell: (617) 331 8238
---------------------------------------------------------------
Open IGC is here!

Extend the Catalog with custom objects and lineage definitions!
https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/



From:   Russell Anderson/Worcester/IBM
To:     Ernie Ostic/Newark/IBM@IBMUS
Cc:     Russell Anderson/Worcester/IBM@IBMUS
Date:   07/07/2016 09:56 AM
Subject:        Re: Expanded my Cloudera Metabroker to now process Horton
            Works...one issue


Hi Ernie,

If you mean this:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_data_governance/content/section_atlas_restapi_hivelineageresource.html

I disagree.

There is no clear illustration of how to order the returned results of the
'edges'.

This is 'by the way' the biggest hole with both HW/Atlas/Apache and with
Cloudera.

I had to reverse engineer the Cloudera lineage (very well i might add) -
with little to no help from Cloudera. However, at least Cloudera has an
implied way to create the correct lineage ordering.

That is not true with Apache/Atlas - HW.

QUESTION: We are supposed to have a partnership with Horton Works - do we
have people that we can speak to on this topic?

Regards,

Russell G. Anderson
 Senior Technical Consultant





From:   Ernie Ostic/Newark/IBM
To:     Russell Anderson/Worcester/IBM@IBMUS
Date:   07/06/2016 10:23 PM
Subject:        Re: Expanded my Cloudera Metabroker to now process Horton
            Works...one issue


; )   no....absolutely not...lol........it's all still
experimentation......but I looked at what that returns and decided......not
enough detail.    So I have been chasing the processes themselves....which
have clear illustrations of inputs and outputs.........

Ernie





Ernie Ostic

WW Product Specialist, Information Server
IBM Analytics
Cell: (617) 331 8238
---------------------------------------------------------------
Open IGC is here!

Extend the Catalog with custom objects and lineage definitions!
https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/




From:   Russell Anderson/Worcester/IBM
To:     Ernie Ostic/Newark/IBM@IBMUS
Cc:     Russell Anderson/Worcester/IBM@IBMUS
Date:   07/06/2016 03:34 PM
Subject:        Re: Expanded my Cloudera Metabroker to now process Horton
            Works...one issue


Ernie,

I have created a specific test case lineage whereby I am intentionally
causing it to to create a new hive table from an existing table.

You will find that inputs/graph and outputs/graph (if it exists) returns
you the results in terms of 'edges' and 'verticies'.

The edge is the flow between the entities (vertex).

The issue is that JSON can never return a set of edges in any assured
order. So I am now producing lineage but I have no way of determining the
proper order.

Second, it appears that the edges results is returning duplicated GUID's
too.

So you think that got it all workng yet?

Regards,

Russell G. Anderson
 Senior Technical Consultant





From:   Ernie Ostic/Newark/IBM
To:     Russell Anderson/Worcester/IBM@ibmus
Cc:     Russell Anderson/Worcester/IBM@ibmus
Date:   07/06/2016 03:29 PM
Subject:        Re: Expanded my Cloudera Metabroker to now process Horton
            Works...one issue



I will take a look later tonite....have been at a customer all day
today......but in the atlas work i have been doing, i see it as a custom
effort for each and every meta Type...........and then, if that meta type
inspires lineage (such as with type sqoop_process), I can illustrate it.
Is that what you are doing?   Which Type did you begin with? .....or are
you just capturing generic lineage results/graphs......

Ernie



Sent from my iPad using IBM Verse

On Jul 6, 2016, 3:15:09 PM, [email protected] wrote:

From: [email protected]
To: [email protected]
Cc: [email protected]
Date: Jul 6, 2016 3:15:09 PM
Subject: Expanded my Cloudera Metabroker to now process Horton Works...one
issue

Hi Ernie,

I have managed to expand my Cloudera Metabroker code to now include Horton
Works Apache Atlas.

I have posted the following :

https://community.hortonworks.com/questions/43778/the-results-returned-from-http192168114421000apiat.html


One really bad thing about JSON is you cannot assure order of your results
( see many references on internet ).

So the Inputs/Graph returns "edges" and "verticies" - the problem is that
the edges appear in no particular order and moreover there appear to be
duplicates references to the GUID's.

Do you know by what means you are supposed to glean the proper
source-to-target ordering?

I know you have been playing around with this but not sure if you have
gotten this far with understanding the results.

Let me know.

Regards,

Russell G. Anderson
Senior Technical Consultant



Reply via email to