Re: Pig on Spark

Praveen R Thu, 17 Jul 2014 03:54:07 -0700

Hi Cheolsoo,

Thanks for your reply.


Currently we felt github issues would work well with the developers and
once we see more number of issues coming we shall start a jira and file
issues there.

Also, we are looking at sending a proposal to pig dev group soon to hear
comments on the project.


On Wed, Jul 16, 2014 at 5:04 AM, Cheolsoo Park <[email protected]> wrote:

> Hi Praveen,
>
> Thank you for sharing your work!
>
> As far as I know, there are quite a few people who are interested in Pig on
> Spark. I am wondering whether we can collaborate together to avoid
> duplicate efforts as a community.
>
> Do you think we can create a umbrella jira for Pig on Spark and continue
> the discussion there? Once we agree on the design, Pig committers are
> willing to help create a feature branch and commit patches. Please let me
> know what you think.
>
> Thanks,
> Cheolsoo
>
>
> On Tue, Jul 15, 2014 at 7:36 AM, Praveen R <[email protected]>
> wrote:
>
> > Hi Everyone,
> >
> > We, at SigmoidAnalytics have been working on pig on spark for sometime
> and
> > would like to hear your thoughts about it.
> >
> > You can find the repo at here: https://github.com/sigmoidanalytics/spork
> > and
> > the README has been updated to work with Spark 0.9. We have currently
> > tested it on hadoop-1.0.4 and hadoop-2.2.0.
> >
> > Below are some major issues we are having:
> > 1. Send objects from driver to executors, we have built at tcp server to
> > broadcast
> > <
> >
> https://github.com/sigmoidanalytics/spork/commit/b35b57d94c9b0b4dfdf165b30ba8145f65975f23
> > >
> > data to executors to achieve this.
> > 2. Large shuffle data when performing groupBy.
> >
> > Please feel free to file issues on the github repo or mail us at:
> > [email protected].
> >
> > Thanks,
> > Praveen R
> >
>

Re: Pig on Spark

Reply via email to