Hi Hemanth, Thank you for taking the time to respond. I will take a look at ATLAS-51 and will also be interested in hearing from others like you eluded to in your response.
Cheers, Sandeep. On Sun, Dec 4, 2016 at 5:09 AM, Hemanth Yamijala <hyamij...@hortonworks.com> wrote: > Hi Sandeep, > > Responses inline. Hoping others can pitch in with more recent information, > as mine might be a little dated. > > Thanks > hemanth > ________________________________________ > From: Sandeep Nayak <datacacoph...@gmail.com> > Sent: Sunday, December 04, 2016 12:00 AM > To: dev@atlas.incubator.apache.org > Cc: Venkatesh Seetharam > Subject: Re: Interest in Apache Atlas > > Hi all, > > Sending a reminder, I am looking for answers to the questions below. Can > someone help? > > Thanks in advance for your attention. > > - Sandeep > > On Thu, Dec 1, 2016 at 12:13 AM, Sandeep Nayak <datacacoph...@gmail.com> > wrote: > > > Hi all, > > > > I had asked a couple questions to Venkatesh earlier please see email > > below. He recommended that I move the questions to the dev mailing list > and > > thus this mail. > > > > To follow up on the questions asked below to my queries > > > > (a) Multi-tenancy: If I were to bring in data-sets from different > > customers then I need to record, annotate or tag and provide access to > > data-sets only to the relevant owners. Is it possible for me to record > and > > manage data-sets for different customers in a single Atlas instance? Does > > Atlas provide me with the necessary constructs to separate recording of > > data-sets by tenant and tracking metadata etc by tenant? > > It is possible to build a solution on top of Atlas to satisfy your > requirements. It appears you need a namespacing facility of sorts. While > there is no native construct like that in Atlas today (please see ATLAS-51, > which is still open), I guess you could rely on the extensibility of the > type system to let your objects extend from a base type that defines a > tenant attribute. Then use wrapper APIs that filter out objects according > to the tenant in question. Of course, one could use the lower level APIs to > get around this, and hence it is cooperative in nature. > > > > > (c) Performance Numbers: I understand it is built to scale given the use > > of HBase but any performance numbers that can be shared will be helpful. > > E.g. Is there a limit to the number of data-sets I can record on Atlas? > Are > > there performance numbers on the number of queries? > > > > This is dated information (at least couple of months). If someone has > updated numbers, we should hear from them. At that time, we tested > importing 50K Hive tables and dependent objects (columns etc) with a total > of about < 10M vertices. > > From what I remember, I think we could import these in about 20 minutes or > so. However, this does make some assumptions about the dependencies on the > data sets and hence we could bump up parallelism for import. We tested > reads with queries from 30 users in parallel. Times vary based on type of > queries - simple lookups take seconds, but more complex queries like > lineage take longer. > > This is a constant source of improvement in the project and there are > several JIRAs talking about performance changes including some that are > still open. E.g. ATLAS-711. > > > (d) Are there companies using Atlas in production at this stage? > > > > Thanks in advance for your responses. > > > > - Sandeep > > > > > > > > > > On Fri, Nov 18, 2016 at 9:10 AM, Venkatesh Seetharam < > venkat...@apache.org > > > wrote: > > > >> Sandeep - please use the dev mailing list for atlas for a prompt > response. > >> > >> (a) How can one achieve multi-tenancy on Apache Atlas? > >> Can you pls elaborate? You can always have a package structure for your > >> data sets. > >> > >> (b) Is Atlas ready for production usage? > >> It depends, I think it is but needs some scripting around BCP, etc. > >> > >> (c) Are there published numbers on the volume of data-sets Atlas can > >> manage? > >> Its built to scale, uses Titan & Hbase as a backend store which is known > >> to scale. > >> > >> On Fri, Nov 4, 2016 at 12:02 PM Sandeep Nayak <datacacoph...@gmail.com> > >> wrote: > >> > >>> Hi Venkatesh, > >>> > >>> I apologize for the direct email, if there is a better channel to > >>> surface my questions I will be happy to go there. I am subscribed to > >>> dev@atlas but thought that may not be the right forum for questions > >>> potential Atlas users may have. > >>> > >>> I am looking for Data Catalog solutions and in early evaluation and > from > >>> what I read so far it appears Apache Atlas provides most of the > >>> capabilities I am looking for. Namely data-set registration, lineage > >>> tracking, access control (via Ranger), auditing to name a few. > >>> > >>> I do have a couple questions which will help me in my evaluation > >>> > >>> (a) How can one achieve multi-tenancy on Apache Atlas? > >>> (b) Is Atlas ready for production usage? > >>> (c) Are there published numbers on the volume of data-sets Atlas can > >>> manage? One of the requirements I pointed out above is data lineage > and if > >>> I am ingesting streaming and batch data sets the typical volumes could > be > >>> very high. > >>> > >>> Hoping you will point me in the right direction to get answers. > >>> > >>> Thanks for your time and help. > >>> > >>> Regards, > >>> > >>> Sandeep > >>> > >> > > >