Hi Jay, For your immediate use case, will the MR mode work? If that is the case, you can take a look at Hive Distcp: https://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/
For GOBBLIN-714, can you attach any relevant stacktraces that you see in the cluster logs that indicate the failure of the jobs? It is interesting that the Job execution state for most of the jobs is shown as COMMITTED as opposed to SUCCESSFUL. Thanks, Sudarshan ________________________________ From: Jay Sen <[email protected]> Sent: Tuesday, April 2, 2019 8:02 PM To: Sudarshan Vasudevan; [email protected] Subject: Re: Gobblin on Yarn ? Thanks Sudarshan for sharing the info. I started playing around gobblin cluster ( master/worker) mode and came across some weird issues, ( GOBBLIN-714<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-714&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=D4n7%2Fu2pZ6a95dwZ0d8%2Fc8ht%2BrbQjQND%2BPpfu%2FM5OdA%3D&reserved=0> & GOBBLIN-711<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-711&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=r8nF3zNWl5D4it5GS0lLk0bWlMDjr%2FZYHWbgyMchyQI%3D&reserved=0> ). I assume the standalone mode is limited to single node ( may be multi process ), so I really need cluster environment capable for tolerating node failures, etc... the immediate use-case i am looking at us hive to hive with overall 10TB a day. Pls let me know ur thoughts. Thanks Jay On Sun, Mar 31, 2019 at 8:29 PM Sudarshan Vasudevan <[email protected]<mailto:[email protected]>> wrote: Hi Jay, We run both Gobblin Cluster and Gobblin Standalone in production, which are both fairly stable. We also run Gobblin pipelines in Mapreduce mode in production. There is some recent interest to revive Gobblin-on-Yarn for a few internal use cases. We will hopefully have something to share on that front. So stay tuned! If you share more details about your use case (e.g. details about the source/sink, volume of data to be moved), that will help us point you in the right direction. Best, Sudarshan ________________________________ From: Jay Sen <[email protected]<mailto:[email protected]>> Sent: Sunday, March 31, 2019 7:07 PM To: [email protected]<mailto:[email protected]> Subject: Re: Gobblin on Yarn ? Hi All, What would be the most stable mode in gobblin to run on production ? cluster ( master + worker ) or standalone or any other ? what is the mode you are running on prod ? can u guys pls share ? Thanks Jay On Wed, Feb 27, 2019 at 6:16 PM Jay Sen <[email protected]<mailto:[email protected]>> wrote: > Hi, > > anybody running Gobblin on yarn mode in production or even in dev > environment ? can u share pls the experience? > > looking for some data points on how it would be beneficial over standalone. > > Thanks > Jay >
