Hi Sudarshan MR mode, will have dependency on hadoop cluster, I am thinking to have independent gobblin cluster for all the data movement jobs. also I have tried Hive-Distcp <https://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/> on cluster mode and managed to run it. ( there are lot of configs are missing that i was only able to figure out from the code base).
Is there any difference for MR vs Cluster mode in terms of performance or feature set? btw, Regarding GOBBLIN-714, I have lost the log, but this couldnt very edge case, but for GOBBLIN-711 <https://issues.apache.org/jira/browse/GOBBLIN-711> I have captured all the logs. Thanks Jay On Tue, Apr 2, 2019 at 9:20 PM Sudarshan Vasudevan <[email protected]> wrote: > Hi Jay, > For your immediate use case, will the MR mode work? If that is the case, > you can take a look at Hive Distcp: > https://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/ > > For GOBBLIN-714, can you attach any relevant stacktraces that you see in > the cluster logs that indicate the failure of the jobs? It is interesting > that the Job execution state for most of the jobs is shown as COMMITTED as > opposed to SUCCESSFUL. > > Thanks, > Sudarshan > > > ------------------------------ > *From:* Jay Sen <[email protected]> > *Sent:* Tuesday, April 2, 2019 8:02 PM > *To:* Sudarshan Vasudevan; [email protected] > *Subject:* Re: Gobblin on Yarn ? > > Thanks Sudarshan for sharing the info. > > I started playing around gobblin cluster ( master/worker) mode and came > across some weird issues, ( GOBBLIN-714 > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-714&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=D4n7%2Fu2pZ6a95dwZ0d8%2Fc8ht%2BrbQjQND%2BPpfu%2FM5OdA%3D&reserved=0> > & GOBBLIN-711 > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-711&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=r8nF3zNWl5D4it5GS0lLk0bWlMDjr%2FZYHWbgyMchyQI%3D&reserved=0> > ). > > I assume the standalone mode is limited to single node ( may be multi > process ), so I really need cluster environment capable for tolerating node > failures, etc... > > the immediate use-case i am looking at us hive to hive with overall 10TB a > day. > > Pls let me know ur thoughts. > > Thanks > Jay > > On Sun, Mar 31, 2019 at 8:29 PM Sudarshan Vasudevan < > [email protected]> wrote: > > Hi Jay, > We run both Gobblin Cluster and Gobblin Standalone in production, which > are both fairly stable. We also run Gobblin pipelines in Mapreduce mode in > production. > > There is some recent interest to revive Gobblin-on-Yarn for a few internal > use cases. We will hopefully have something to share on that front. So stay > tuned! > > If you share more details about your use case (e.g. details about the > source/sink, volume of data to be moved), that will help us point you in > the right direction. > > Best, > Sudarshan > ------------------------------ > *From:* Jay Sen <[email protected]> > *Sent:* Sunday, March 31, 2019 7:07 PM > *To:* [email protected] > *Subject:* Re: Gobblin on Yarn ? > > Hi All, > > What would be the most stable mode in gobblin to run on production ? > cluster ( master + worker ) or standalone or any other ? > > what is the mode you are running on prod ? can u guys pls share ? > > Thanks > Jay > > > On Wed, Feb 27, 2019 at 6:16 PM Jay Sen <[email protected]> wrote: > > > Hi, > > > > anybody running Gobblin on yarn mode in production or even in dev > > environment ? can u share pls the experience? > > > > looking for some data points on how it would be beneficial over > standalone. > > > > Thanks > > Jay > > > >
