Hi Cheolsoo, Thanks for your reply.
Currently we felt github issues would work well with the developers and once we see more number of issues coming we shall start a jira and file issues there. Also, we are looking at sending a proposal to pig dev group soon to hear comments on the project. On Wed, Jul 16, 2014 at 5:04 AM, Cheolsoo Park <[email protected]> wrote: > Hi Praveen, > > Thank you for sharing your work! > > As far as I know, there are quite a few people who are interested in Pig on > Spark. I am wondering whether we can collaborate together to avoid > duplicate efforts as a community. > > Do you think we can create a umbrella jira for Pig on Spark and continue > the discussion there? Once we agree on the design, Pig committers are > willing to help create a feature branch and commit patches. Please let me > know what you think. > > Thanks, > Cheolsoo > > > On Tue, Jul 15, 2014 at 7:36 AM, Praveen R <[email protected]> > wrote: > > > Hi Everyone, > > > > We, at SigmoidAnalytics have been working on pig on spark for sometime > and > > would like to hear your thoughts about it. > > > > You can find the repo at here: https://github.com/sigmoidanalytics/spork > > and > > the README has been updated to work with Spark 0.9. We have currently > > tested it on hadoop-1.0.4 and hadoop-2.2.0. > > > > Below are some major issues we are having: > > 1. Send objects from driver to executors, we have built at tcp server to > > broadcast > > < > > > https://github.com/sigmoidanalytics/spork/commit/b35b57d94c9b0b4dfdf165b30ba8145f65975f23 > > > > > data to executors to achieve this. > > 2. Large shuffle data when performing groupBy. > > > > Please feel free to file issues on the github repo or mail us at: > > [email protected]. > > > > Thanks, > > Praveen R > > >
