Re: Spark SQL v MemSQL/Voltdb
Hi Ashish, Transactions are a big difference between Spark SQL and MemSQL/VoltDB, but there are other differences as well. I'm not an expert on Volt, but another difference between Spark SQL and MemSQL is that DataFrames do not support indexes and MemSQL tables do. This will have implications for scanning and query execution performance. Recently released MemSQL 4 also contains improvements to the distributed optimizer. For large, infrequently changing data sets, you could use the MemSQL column store and only need a single system for storage and query (Spark does not include storage natively, so you would need to use an external data store). You can also use Spark in combination with MemSQL, either row store or column store, using the MemSQL Spark Connector. Thanks, Conor On Thu, May 28, 2015 at 10:36 PM, Ashish Mukherjee ashish.mukher...@gmail.com wrote: Hi Mohit, Thanks for your reply. If my use case is purely querying read-only data (no transaction scenarios), at what scale is one of them a better option than the other? I am aware that for scale which can be supported on a single node, VoltDB is a better choice. However, when the scale grows to a clustered scenario, which is the right engine at various degrees of scale? Regards, Ashish On Fri, May 29, 2015 at 6:57 AM, Mohit Jaggi mohitja...@gmail.com wrote: I have used VoltDB and Spark. The use cases for the two are quite different. VoltDB is intended for transactions and also supports queries on the same(custom to voltdb) store. Spark(SQL) is NOT suitable for transactions; it is designed for querying immutable data (which may exist in several different forms of stores). On May 28, 2015, at 7:48 AM, Ashish Mukherjee ashish.mukher...@gmail.com wrote: Hello, I was wondering if there is any documented comparison of SparkSQL with MemSQL/VoltDB kind of in-memory SQL databases. MemSQL etc. too allow queries to be run in a clustered environment. What is the major differentiation? Regards, Ashish
Re: Spark SQL v MemSQL/Voltdb
I have used VoltDB and Spark. The use cases for the two are quite different. VoltDB is intended for transactions and also supports queries on the same(custom to voltdb) store. Spark(SQL) is NOT suitable for transactions; it is designed for querying immutable data (which may exist in several different forms of stores). On May 28, 2015, at 7:48 AM, Ashish Mukherjee ashish.mukher...@gmail.com wrote: Hello, I was wondering if there is any documented comparison of SparkSQL with MemSQL/VoltDB kind of in-memory SQL databases. MemSQL etc. too allow queries to be run in a clustered environment. What is the major differentiation? Regards, Ashish - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL v MemSQL/Voltdb
Hi Mohit, Thanks for your reply. If my use case is purely querying read-only data (no transaction scenarios), at what scale is one of them a better option than the other? I am aware that for scale which can be supported on a single node, VoltDB is a better choice. However, when the scale grows to a clustered scenario, which is the right engine at various degrees of scale? Regards, Ashish On Fri, May 29, 2015 at 6:57 AM, Mohit Jaggi mohitja...@gmail.com wrote: I have used VoltDB and Spark. The use cases for the two are quite different. VoltDB is intended for transactions and also supports queries on the same(custom to voltdb) store. Spark(SQL) is NOT suitable for transactions; it is designed for querying immutable data (which may exist in several different forms of stores). On May 28, 2015, at 7:48 AM, Ashish Mukherjee ashish.mukher...@gmail.com wrote: Hello, I was wondering if there is any documented comparison of SparkSQL with MemSQL/VoltDB kind of in-memory SQL databases. MemSQL etc. too allow queries to be run in a clustered environment. What is the major differentiation? Regards, Ashish
Spark SQL v MemSQL/Voltdb
Hello, I was wondering if there is any documented comparison of SparkSQL with MemSQL/VoltDB kind of in-memory SQL databases. MemSQL etc. too allow queries to be run in a clustered environment. What is the major differentiation? Regards, Ashish