[
https://issues.apache.org/jira/browse/CASSANDRA-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660443#comment-14660443
]
Brian Hess commented on CASSANDRA-8234:
----------------------------------------
A few questions:
1. For OSS C* (as opposed to DSE), will the Spark Master be visible to users
other than C* itself? As in, will Cassandra be the only process/user able to
execute Spark jobs? Or will users be able to submit jobs, start the SparkSQL
thriftserver, etc?
2. How current will Spark be kept with Cassandra? Will there be any guidance
(or guarantees) about how stale the Spark is that is being included? Or how
often Cassandra will be upgraded to incorporate a new Spark? Same for the OSS
spark-cassandra-connector.
3. Will there be a load-sharing system in place so that multiple CTAS queries
can run simultaneously (Spark in stand-alone mode will by default "reserve" all
available cores and "lock out" another spark job)?
4. Will there be some "sandboxing" of Spark so that C* and Spark play nicely
(with respect to RAM, CPU, etc)?
5. My assumption is that "CREATE TABLE b(x INT, y INT, z INT) AS SELECT x, y, z
FROM a WITH PRIMARY KEY ((x), y)" [syntax for illustrative purposes only] will
be an asynchronous operation. That is, it will return "success" to the client,
but the operation will be a background operation. First, is that correct? If
so, I think there will have to be a status like in MVs and 2Is, correct? If
not, what will do about timing out of this query?
> CTAS (CREATE TABLE AS SELECT)
> -----------------------------
>
> Key: CASSANDRA-8234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8234
> Project: Cassandra
> Issue Type: New Feature
> Components: Tools
> Reporter: Robin Schumacher
> Fix For: 3.x
>
>
> Continuous request from users is the ability to do CREATE TABLE AS SELECT.
> The simplest form would be copying the entire table. More advanced would
> allow specifying thes column and UDF to call as well as filtering rows out in
> WHERE.
> More advanced still would be to get all the way to allowing JOIN, for which
> we probably want to integrate Spark.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)