Hi All,

To build on prior discussion on Airavata Data Lake [1], [2], [3], our next big 
step is to make implementation choices. Looks like Apache Airflow [4]  is a 
unparalleled choice. If any of you are interested will be happy to provide a 
detailed breakdown of this evaluation. 

On the contrary, a choice is metadata catalog is tricky given the overwhelming 
number of competing options and all have their own strengths. Looks like the 
best way forward is for us to document the capabilities which are important to 
airavata and do a hackethon exploring each of the choices and settle on one. 
Magda [5], and Atlas [6] both looked promising but do not natively support 
multi-tenancy. Can we all explore together DataHub [7], Amundsen [8] and 
Metacat [9]. There are more options, but I listed the ones with wide 
contribution base. 

Thoughts,

Cheers,
Suresh

[1] - https://markmail.org/thread/cjasb2m5ag6hb7y6 
<https://markmail.org/thread/cjasb2m5ag6hb7y6> 
[2] - https://markmail.org/thread/z2arxbby6xxb57pq 
<https://markmail.org/thread/z2arxbby6xxb57pq> 
[3] - https://github.com/apache/airavata-data-lake 
<https://github.com/apache/airavata-data-lake>
[4] - https://airflow.apache.org/ <https://airflow.apache.org/> 
[5] - https://magda.io/ <https://magda.io/> 
[6] - https://atlas.apache.org/#/ <https://atlas.apache.org/#/>
[7] - https://github.com/linkedin/datahub <https://github.com/linkedin/datahub> 
[8] -https://github.com/amundsen-io/amundsen 
<https://github.com/amundsen-io/amundsen> 
[9] - https://github.com/Netflix/metacat <https://github.com/Netflix/metacat> 

Reply via email to