[jira] [Commented] (CASSANDRA-8234) CTAS (CREATE TABLE AS SELECT)

Brian Hess (JIRA) Thu, 06 Aug 2015 10:44:57 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660443#comment-14660443
 ]


 Brian Hess commented on CASSANDRA-8234:
----------------------------------------

A few questions:
1. For OSS C* (as opposed to DSE), will the Spark Master be visible to users 
other than C* itself?  As in, will Cassandra be the only process/user able to 
execute Spark jobs?  Or will users be able to submit jobs, start the SparkSQL 
thriftserver, etc?
2. How current will Spark be kept with Cassandra?  Will there be any guidance 
(or guarantees) about how stale the Spark is that is being included?  Or how 
often Cassandra will be upgraded to incorporate a new Spark?  Same for the OSS 
spark-cassandra-connector.
3. Will there be a load-sharing system in place so that multiple CTAS queries 
can run simultaneously (Spark in stand-alone mode will by default "reserve" all 
available cores and "lock out" another spark job)?
4. Will there be some "sandboxing" of Spark so that C* and Spark play nicely 
(with respect to RAM, CPU, etc)?
5. My assumption is that "CREATE TABLE b(x INT, y INT, z INT) AS SELECT x, y, z 
FROM a WITH PRIMARY KEY ((x), y)" [syntax for illustrative purposes only] will 
be an asynchronous operation.  That is, it will return "success" to the client, 
but the operation will be a background operation.  First, is that correct?  If 
so, I think there will have to be a status like in MVs and 2Is, correct?  If 
not, what will do about timing out of this query?


> CTAS (CREATE TABLE AS SELECT)
> -----------------------------
>
>                 Key: CASSANDRA-8234
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8234
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Robin Schumacher
>             Fix For: 3.x
>
>
> Continuous request from users is the ability to do CREATE TABLE AS SELECT.  
> The simplest form would be copying the entire table.  More advanced would 
> allow specifying thes column and UDF to call as well as filtering rows out in 
> WHERE.
> More advanced still would be to get all the way to allowing JOIN, for which 
> we probably want to integrate Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8234) CTAS (CREATE TABLE AS SELECT)

Reply via email to