Hi Praveen, Thank you for sharing your work!
As far as I know, there are quite a few people who are interested in Pig on Spark. I am wondering whether we can collaborate together to avoid duplicate efforts as a community. Do you think we can create a umbrella jira for Pig on Spark and continue the discussion there? Once we agree on the design, Pig committers are willing to help create a feature branch and commit patches. Please let me know what you think. Thanks, Cheolsoo On Tue, Jul 15, 2014 at 7:36 AM, Praveen R <[email protected]> wrote: > Hi Everyone, > > We, at SigmoidAnalytics have been working on pig on spark for sometime and > would like to hear your thoughts about it. > > You can find the repo at here: https://github.com/sigmoidanalytics/spork > and > the README has been updated to work with Spark 0.9. We have currently > tested it on hadoop-1.0.4 and hadoop-2.2.0. > > Below are some major issues we are having: > 1. Send objects from driver to executors, we have built at tcp server to > broadcast > < > https://github.com/sigmoidanalytics/spork/commit/b35b57d94c9b0b4dfdf165b30ba8145f65975f23 > > > data to executors to achieve this. > 2. Large shuffle data when performing groupBy. > > Please feel free to file issues on the github repo or mail us at: > [email protected]. > > Thanks, > Praveen R >
