Re: Gobblin on Yarn ?

Jay Sen Tue, 02 Apr 2019 23:17:41 -0700

Hi Sudarshan

MR mode, will have dependency on hadoop cluster, I am thinking to have
independent gobblin cluster for all the data movement jobs.
also I have tried Hive-Distcp
<https://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/> on
cluster mode and managed to run it. ( there are lot of configs are missing
that i was only able to figure out from the code base).


Is there any difference for MR vs Cluster mode in terms of performance or
feature set?

btw, Regarding GOBBLIN-714, I have lost the log, but this couldnt very edge
case, but for GOBBLIN-711
<https://issues.apache.org/jira/browse/GOBBLIN-711> I have captured all the
logs.

Thanks
Jay




On Tue, Apr 2, 2019 at 9:20 PM Sudarshan Vasudevan <[email protected]>
wrote:

> Hi Jay,
> For your immediate use case, will the MR mode work? If that is the case,
> you can take a look at Hive Distcp:
> https://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/
>
> For GOBBLIN-714, can you attach any relevant stacktraces that you see in
> the cluster logs that indicate the failure of the jobs? It is interesting
> that the Job execution state for most of the jobs is shown as COMMITTED as
> opposed to SUCCESSFUL.
>
> Thanks,
> Sudarshan
>
>
> ------------------------------
> *From:* Jay Sen <[email protected]>
> *Sent:* Tuesday, April 2, 2019 8:02 PM
> *To:* Sudarshan Vasudevan; [email protected]
> *Subject:* Re: Gobblin on Yarn ?
>
> Thanks Sudarshan for sharing the info.
>
> I started playing around gobblin cluster ( master/worker) mode and came
> across some weird issues, ( GOBBLIN-714
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-714&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=D4n7%2Fu2pZ6a95dwZ0d8%2Fc8ht%2BrbQjQND%2BPpfu%2FM5OdA%3D&reserved=0>
>  & GOBBLIN-711
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-711&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=r8nF3zNWl5D4it5GS0lLk0bWlMDjr%2FZYHWbgyMchyQI%3D&reserved=0>
>  ).
>
> I assume the standalone mode is limited to single node ( may be multi
> process ), so I really need cluster environment capable for tolerating node
> failures, etc...
>
> the immediate use-case i am looking at us hive to hive with overall 10TB a
> day.
>
> Pls let me know ur thoughts.
>
> Thanks
> Jay
>
> On Sun, Mar 31, 2019 at 8:29 PM Sudarshan Vasudevan <
> [email protected]> wrote:
>
> Hi Jay,
> We run both Gobblin Cluster and Gobblin Standalone in production, which
> are both fairly stable. We also run Gobblin pipelines in Mapreduce mode in
> production.
>
> There is some recent interest to revive Gobblin-on-Yarn for a few internal
> use cases. We will hopefully have something to share on that front. So stay
> tuned!
>
> If you share more details about your use case (e.g. details about the
> source/sink, volume of data to be moved), that will help us point you in
> the right direction.
>
> Best,
> Sudarshan
> ------------------------------
> *From:* Jay Sen <[email protected]>
> *Sent:* Sunday, March 31, 2019 7:07 PM
> *To:* [email protected]
> *Subject:* Re: Gobblin on Yarn ?
>
> Hi All,
>
> What would be the most stable mode in gobblin to run on production ?
> cluster ( master + worker ) or standalone or any other ?
>
> what is the mode you are running on prod ? can u guys pls share ?
>
> Thanks
> Jay
>
>
> On Wed, Feb 27, 2019 at 6:16 PM Jay Sen <[email protected]> wrote:
>
> > Hi,
> >
> > anybody running Gobblin on yarn mode in production or even in dev
> > environment ? can u share pls the experience?
> >
> > looking for some data points on how it would be beneficial over
> standalone.
> >
> > Thanks
> > Jay
> >
>
>

Re: Gobblin on Yarn ?

Reply via email to