Hi Everyone,

We, at SigmoidAnalytics have been working on pig on spark for sometime and
would like to hear your thoughts about it.

You can find the repo at here: https://github.com/sigmoidanalytics/spork and
the README has been updated to work with Spark 0.9. We have currently
tested it on hadoop-1.0.4 and hadoop-2.2.0.

Below are some major issues we are having:
1. Send objects from driver to executors, we have built at tcp server to
broadcast
<https://github.com/sigmoidanalytics/spork/commit/b35b57d94c9b0b4dfdf165b30ba8145f65975f23>
data to executors to achieve this.
2. Large shuffle data when performing groupBy.

Please feel free to file issues on the github repo or mail us at:
[email protected].

Thanks,
Praveen R

Reply via email to