Hi there. Apparently, I am not in a position to say what role Spark can play in the Bigtop for I am not speaking for neither of those projects.
However, I can tell that Spark provides a number of the advantages compare to a traditional MapReduce model: stateful computational model with a need to write everything back to file system after step, in-memory calculations, higher level of primitives expressed in a functional language, etc. These advantages combined with low-latency planner result in a very significant performance improvement. I'd suggest to go over spark-project.org for more information. I am not an expert on Drill, but I'd say that Spark give immediate benefits over the former because it is already here and can be used by anyone ;) As for integration with Bigtop: Spark doesn't require any special integration with the rest of the stack - it might use HDFS as the underlying storage, but that's about it. Looks like there's an ongoing development to allow Spark to use Hive's SerDes, but I am not completely sure about its status. On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote: > On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <afome...@yahoo.com> wrote: > > Hi Alef, > > > > Great news! > > > > Spark developers are interested in developing Spark packages and > > contributing them to open source. Since you already have them, > > what would you think about contributing the source to BigTop? We don't have any plans of holding the sources of the packages back, but we are working on rpm packaging right now. Once the work is over, we should be able to contribute it back to the community. Shall there be a JIRA ticket for that or something? With regards, Alef MTG dev team > This is very, very interesting indeed! I'd also like to hear a bit > more about what role Spark can play in Bigtop project -- from > just skimming the web it feels like it can be seen as an > alternative to Apache Drill (incubating) or am I completely off > base here? > > Also, what level of integration is required between Spark and > the rest of Hadoop ecosystem components (Hive, Pig, etc.)? > > Thanks, > Roman.