Also strongly +1 to copying all the options.
On Wed, Oct 23, 2024 at 5:52 AM Josh McKenzie <jmcken...@apache.org> wrote: > I'm a very strong +1 to having the default functionality be to copy *ALL* > options. > > Intuitively, as a user, if I tell a software system to make a clone of > something I don't expect it to be shallow or a subset defined by some > external developer somewhere. I expect it to be a clone. > > Adding in some kind of "lean" mode or "column only" is fine if someone can > make a cogent argument around its inclusion. I don't personally see a > use-case for it right now but definitely open to being educated. > > On Wed, Oct 23, 2024, at 3:03 AM, Štefan Miklošovič wrote: > > options are inherently part of that table as well, same as schema. In > fact, _schema_ includes all options. Not just columns and its names. If you > change some option, you effectively have a different schema, schema version > changes by changing an option. So if we do not copy options too, we are > kind of faking it (when we do not specify WITH OPTIONS). > > Also, imagine a situation where Accord is merged to trunk. It introduces a > new schema option called "transactional = full" which is not default. (I am > sorry if I did the spelling wrong here). So, when you have a table with > transactional support and you do "create table ks.tb_copy like ks.tb", when > you _do not_ copy all options, this table will _not_ become transactional. > > The next thing you go to do is to execute some transactions against this > table but well ... you can not do that, because your table is not > transactional, because you have forgotten to add "WITH OPTIONS". So you > need to go back to that and do "ALTER ks.tb_copy WITH transactional = full" > just to support that. > > I think that you see from this pattern that it is way better if we copy > all options by default instead of consciously opt-in into them. > > also: > > "but I think there are also some users want to do basic column information > copy" > > where is this coming from? Do you have this idea somehow empirically > tested? I just do not see why somebody would want to have Cassandra's > defaults instead of what a base table contains. > > On Wed, Oct 23, 2024 at 8:28 AM guo Maxwell <cclive1...@gmail.com> wrote: > > The reason for using OPTION keyword is that I want to provide users with > more choices . > The default behavior for copying a table is to copy the basic item of > table (column and their data type,mask,constraint),others thing belongs to > the table like option,views,trigger > are optional in my mind. > You are absolutely right that users may want to copy all stuff but I think > there are aslo some users want to do basic column information copy,So I > just give them a choice。As we know that the number of table parameters is > not small,compression,compaction,gc_seconds,bf_chance,speculative_retry and > so on. > > Besides we can see that pg have also the keyword COMMENT,COMPRESSION which > have the similar behavior as our OPTION keyword。 > > So that is why I add this keyword OPTION. > > > Štefan Miklošovič <smikloso...@apache.org>于2024年10月22日 周二下午11:40写道: > > The problem is that when I do this minimal CQL which shows this feature: > > CREATE TABLE ks.tb_copy LIKE ks.tb; > > then you are saying that when I _do not_ specify WITH OPTIONS then I get > Cassandra's defaults. Only after I specify WITH OPTIONS, it would truly be > a copy. > > This is not a good design. Because to have an exact copy, I have to make a > conscious effort to include OPTIONS as well. That should not be the case. I > just want to have a copy, totally the same stuff, when I use the minimal > version of that statement. It would be better to opt-out from options like > > CREATE TABLE ks.tb_copy LIKE ks.tb WITHOUT OPTIONS (you feel me) but we do > not support this (yet). > > On Tue, Oct 22, 2024 at 5:28 PM Štefan Miklošovič <smikloso...@apache.org> > wrote: > > I just don't see OPTIONS as important. When I want to copy a table, I am > copying a table _with everything_. Options included, by default. Why would > I want to have a copy of a table with options different from the base one? > > > On Mon, Oct 21, 2024 at 3:55 PM Bernardo Botella < > conta...@bernardobotella.com> wrote: > > Hi Guo, > > +1 for the CONSTRAINTS keyword to be added into the default behavior. > > Bernardo > > On Oct 21, 2024, at 12:01 AM, guo Maxwell <cclive1...@gmail.com> wrote: > > I think the CONSTRAINTS keyword keyword may be in the same situation as > datamask. > Maybe it is better to include constraints into the default behavior of > table copy together with column name, column data type and data mask. > > guo Maxwell <cclive1...@gmail.com> 于2024年10月21日周一 14:56写道: > > To yifan : > I don't mind adding the ALL keyword, and it has been updated into CEP. > > As all you can see, our original intention was that the grammar would not > be too complicated, which is what I described in cep > <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-43++Apache+Cassandra+CREATE+TABLE++LIKE> > . > We gave up PG-related grammar, including INCLUDING/EXCLUDING and so on . > > guo Maxwell <cclive1...@gmail.com> 于2024年10月21日周一 14:52写道: > > Hi , > To sefan : > I may want to explain that if there is no OPTION keyword in the CQL > statement, then the newly created table will only have the > original table's column name 、column type and data mask ,I think this is > the most basic choice when copying tables to users. > Then we do some addition, we can add original table's table options like > compaction strategy/compress strategy、index and so on. > > Recently, I have also thought about the situation of CONSTRAINTS keyword. > I think it is similar to data mask. Agree that it should be included in the > basic options of table copy (column name, column data type , column data > mask and constraints). > > > Dave Herrington <he...@rhinosource.com> 于2024年10月19日周六 01:15写道: > > It seems like a natural extension of the CREATE TABLE statement. Looking > forward to using it in the future. > > -Dave > > On Thu, Oct 17, 2024 at 5:11 PM Štefan Miklošovič <smikloso...@apache.org> > wrote: > > Right?! Reads like English, the impact on the existing CQL is minimal. One > LIKE which basically needs to be there and keywords of logical "components" > which seamlessly integrate with WITH. > > I would _not_ use WITH CONSTRAINTS because constraints will be inherently > part of a table schema. It is not an "option". We can not "opt-out" from > them. Remember we are copying a table here so if a base one has > constraints, its copy will have them too. A user can subsequently "ALTER" > them. > > On Thu, Oct 17, 2024 at 5:31 PM Dave Herrington <he...@rhinosource.com> > wrote: > > Basing it on CREATE TABLE, the BNF definition of the simple implementation > would look something like this: > > create_table_statement::= CREATE TABLE [ IF NOT EXISTS ] table_name LIKE > base_table_name > [ WITH included_objects ] [ [ AND ] table_options ] > table_options::= COMPACT STORAGE [ AND table_options ] > | CLUSTERING ORDER BY '(' clustering_order ')' > [ AND table_options ] | options > clustering_order::= column_name (ASC | DESC) ( ',' column_name (ASC | > DESC) )* > included_objects::= dependent_objects [ AND dependent_objects ] > dependent_objects:= INDEXES | TRIGGERS | CONSTRAINTS | VIEWS > > > CREATE TABLE [ IF NOT EXISTS ] [<keyspace_name>.]<table_name> LIKE > [<keyspace_name>.]<base_table_name> > [ WITH [ <included_objects > ] > [ [ AND ] [ <table_options> ] ] > [ [ AND ] CLUSTERING ORDER BY [ <clustering_column_name> (ASC | DESC) ] ] > ; > > Examples: > > -- Create base table: > CREATE TABLE cycling.cyclist_name ( > id UUID PRIMARY KEY, > lastname text, > firstname text > ); > > -- Create an exact copy of the base table, but do not create any dependent > objects: > CREATE TABLE cycling.cyclist_name2 LIKE cycling.cyclist_name; > > -- Create an exact copy with all dependent objects (constraints excluded > for now): > CREATE TABLE cycling.cyclist_name3 LIKE cycling.cyclist_name > WITH INDEXES AND TRIGGERS AND VIEWS; > > -- Create a copy with LCS compaction, a default TTL and all dependent > objects except indexes: > CREATE TABLE cycling.cyclist_name4 LIKE cycling.cyclist_name > WITH TRIGGERS AND VIEWS > AND compaction = { 'class' : 'LeveledCompactionStrategy' } > AND default_time_to_live = 86400; > > > > > This seems pretty clean & straightforward. > > -Dave > > On Thu, Oct 17, 2024 at 4:05 PM Dave Herrington <he...@rhinosource.com> > wrote: > > This simple approach resonates with me. I think the Cassandra doc uses > "INDEXES" as the plural for index, i.e.: > https://cassandra.apache.org/doc/stable/cassandra/cql/indexes.html > > -Dave > > On Thu, Oct 17, 2024 at 2:39 PM Štefan Miklošovič <smikloso...@apache.org> > wrote: > > Well we could do something like: > > CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDICES AND TRIGGERS AND > compaction = {'class': '.... } AND ... > > but I can admit it might be seen as an overreach and I am not sure at all > how it would look like in the implementation because we would need to > distinguish WITH INDICES from table options. > > I would > > 1. +0 on ALL. - we don't need this. If we have just INDICES, TRIGGERS, > VIEWS at this point, I don't think enumerating it all is too much to ask. > This is just an implementation detail and if we find it necessary we can > add it later. If you feel strongly about this then add that but it is not > absolutely necessary. > 2. omit OPTIONS - aren't all options copied by default? That is the > goal of the CEP, no? We might just use normal CQL while overriding > from the base table > 3. mix keywords like TRIGGERS / INDICES / CONSTRAINTS into normal > table creation statement > > > > On Thu, Oct 17, 2024 at 3:20 PM Yifan Cai <yc25c...@gmail.com> wrote: > > I would second Štefan's option for functionality simplicity. It seems to > be unnecessary to have the keywords for both inclusion and exclusion in the > CEP. If needed, the exclusion (WITHOUT) can be introduced later. It would > still be backward compatible. > > Regarding "CREATE TABLE ks.tb_copy LIKE ks.tb WITH compaction = {'class': > '.... } AND ... ", I think it only overrides the table options. The CEP > suggests the coarse-grained keyword for each category like table options, > indexes, etc. The functionality provided is not identical. > > I understand that the suggestions are to make operators' life easier by > achieving table creation in a single statement. What is being proposed in > the CEP seems to be at a good balance point. Operators can alter the table > options if needed in the follow-up ALTER table statement. > > - Yifan > > > > On Thu, Oct 17, 2024 at 1:41 PM Štefan Miklošovič <smikloso...@apache.org> > wrote: > > I think we are starting to complicate it. For me the most important > question is who is actually this feature for? If people want to just > prototype something fast or they just want to have "the same table just > under a different name", I think that is going to be used in 99% of cases. > > > My assumption of using WITH which I think I proposed first (4th post in > this thread) was to just blindly copy the most important "parts" logically > related to a table, be it indices, materialized views, or triggers and > enable / disable them as we wish. If no "WITH" is used, then we just get a > table with nothing else. "WITH" will opt-in into that. > > Seeing us contemplating using "INCLUDING" and "EXCLUDING" on individual > options makes me sad a little bit. I think we are over-engineering this. I > just don't see a reasonable use-case where users would need to cherry-pick > what they want and what not. Isn't that just too complicated? If a table > being copied drifts away too much from the original one then users would be > better off with creating a brand new table with CQL as they are used to, > not dealing with "copying" at all. More we drift from what the original > table was like, the less useful this feature is. > > On Wed, Oct 16, 2024 at 10:03 PM Dave Herrington <he...@rhinosource.com> > wrote: > > Sorry that I overlooked the definition of the default in the CEP. I did > look for it but I didn’t see it. > > I think the default behavior you explained makes perfect sense & what one > would expect. > > I like the flexibility of INCLUDING and EXCLUDING that you are considering. > > Would it make sense to use WITH for table options, which would make it > easy (and less confusing IMHO) to override the defaults from the source > table, then use INCLUDING/EXCLUDING for all non-table options such as > constraints and indices? > > It seems this would be easier to document as well, as it could just point > to the CREATE TABLE doc for the options, rather than trying to explain a > bunch of keywords that map to table options. > > -Dave > > David A. Herrington II > President and Chief Engineer > RhinoSource, Inc. > > *Data Lake Architecture, Cloud Computing and Advanced Analytics.* > > www.rhinosource.com > > > On Wed, Oct 16, 2024 at 7:57 PM guo Maxwell <cclive1...@gmail.com> wrote: > > To yifan : > At the beginning of the period, I also thought about adding the keyword > ALL, refer to pg > <https://www.postgresql.org/docs/current/sql-createtable.html> , but I > give up when writing cep as I find that there may be not so many properties > (only three) to copy for C* and > It is possible to decide what is needed and what is not in a very simple > cql, as our ALL is only three properties here. I want to keep it as simple > as possible (based on the advice given by Benjamin), So I grouped > the properties of the table into one category and expressed it with OPTION > keyword. > > But if we are going to split the first keyword OPTION to COMPRESSION > 、COMPACTION、COMMENT and so on. I am +1 on adding ALL back as the properties > are so many and it is simple to use ALL instead of > list all properties. Besides I may change my keyword WITH to INCLUDING and > adding another keyword EXCLUDING to flexibly copy table properties through > simple sql statements, like using 1 not 2 > > > 1. CREATE TABLE newTb like oldTb INCLUDING ALL EXCLUDING INDEXES AND > COMMENTS. > 2. CREATE TABLE newTb like oldTb INCLUDING COMPRESSION CONSTRAINTS > GENERATED IDENTITY STATISTICS STORAGE > > Conclusion: If there may be more keywords to consider in the future, such > as more than 4 , I am +1 on adding ALL back . > > To Dave : > Default behavior is only copy column name, data type ,data mask , you > can see more detail from CEP-43 > <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-43++Apache+Cassandra+CREATE+TABLE++LIKE> > . > > > Patrick McFadin <pmcfa...@gmail.com> 于2024年10月17日周四 06:43写道: > > +1 That makes much more sense in my experience. > > On Wed, Oct 16, 2024 at 12:12 PM Dave Herrington <he...@rhinosource.com> > wrote: > > I'm coming at this with both a deep ANSI SQL background as well as CQL > background. > > Defining the default behavior is the starting point. What gets copied if > we do "CREATE TABLE new_table LIKE original_table;" without a WITH clause? > > Then, you build on that with the specific WITH options. WITH ALL catches > everything. > > -Dave > > On Wed, Oct 16, 2024 at 11:16 AM Yifan Cai <yc25c...@gmail.com> wrote: > > "WITH ALL" seems to be a natural addition to the directives. What do you > think about adding the fifth keyword ALL to retain all fields of the table > schema? > > For instance, CREATE TABLE new_table LIKE original_table WITH ALL, it > replicates options, indexes, triggers, constraints and any applicable kinds > that are introduced in the future. > > - Yifan > > On Wed, Oct 16, 2024 at 7:46 AM guo Maxwell <cclive1...@gmail.com> wrote: > > Disscussed with Bernardo on slack,and +1 with his advice on adding a > fourth keyword. > > The keyword would be CONSTRAINTS , any more suggestion ? > > guo Maxwell <cclive1...@gmail.com>于2024年10月16日 周三上午9:55写道: > > Hi yifan, > Thanks for bringing this up. The SELECT permission on the original table > is needed. Mysql and PG all have mentioned this, and I also specifically > noticed this in my code. > > I probably missed this in the cep documentation. 😅 > > Yifan Cai <yc25c...@gmail.com> 于2024年10月16日周三 07:46写道: > > Thanks for creating the CEP! I think it is missing Bernardo's comment on > "the need for read permissions on the source table". > > CreateTableStatement does not check the permissions outside of the > enclosing keyspace. Having the SELECT permission on the original table is a > requirement for CREATE TABLE LIKE. > > - Yifan > > On Sun, Sep 29, 2024 at 11:01 PM guo Maxwell <cclive1...@gmail.com> wrote: > > Hello, everyone , > I have finished the doc for CEP-43 for CREATE_TABLE_LIKE > <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-43++Apache+Cassandra+CREATE+TABLE++LIKE> > as > said before, looking forward to your suggestions. > > Štefan Miklošovič <smikloso...@apache.org> 于2024年9月25日周三 03:51写道: > > I am sorry I do not follow what you mean, maybe an example would help. > > On Tue, Sep 24, 2024 at 6:18 PM guo Maxwell <cclive1...@gmail.com> wrote: > > > If there are multiple schema information changes in one ddl statement, > will there be schema conflicts in extreme cases? > For example, our statement contains both table creation and index creation. > > guo Maxwell <cclive1...@gmail.com>于2024年9月24日 周二下午8:12写道: > > +1 on splitting this task and adding the ability to copy tables through > different keyspaces in the future. > > Štefan Miklošovič <smikloso...@apache.org> 于2024年9月23日周一 22:05写道: > > If we have this table > > CREATE TABLE ks.tb2 ( > id int PRIMARY KEY, > name text > ); > > I can either specify name of an index on my own like this: > > CREATE INDEX name_index ON ks.tb2 (name) ; > > or I can let Cassandra to figure that name on its own: > > CREATE INDEX ON ks.tb2 (name) ; > > in that case it will name that index "tb2_name_idx". > > Hence, I would expect that when we do > > ALTER TABLE ks.to_copy LIKE ks.tb2 WITH INDICES; > > Then ks.to_copy table will have an index which is called > "to_copy_name_idx" without me doing anything. > > For types, we do not need to do anything when we deal with the same > keyspace. For simplicity, I mentioned that we might deal with the same > keyspace scenario only for now and iterate on that in the future. > > On Mon, Sep 23, 2024 at 8:53 AM guo Maxwell <cclive1...@gmail.com> wrote: > > Hello everyone, > > Cep is being written, and I encountered some problems during the process. > I would like to discuss them with you. If you read the description of this > CASSANDRA-7662 <https://issues.apache.org/jira/browse/CASSANDRA-7662>, we > will find that initially the original creator of this jira did not intend > to implement structural copying of indexes, views, and triggers only the > column and its data type. > > However, after investigating some db related syntax and function > implementation, I found that it may be necessary for us to provide some > rich syntax to support the replication of indexes, views, etc. > > In order to support selective copy of the basic structure of the table > (columns and types), table options, table-related indexes, views, triggers, > etc. We need some new syntax, it seems that the syntax of pg is relatively > comprehensive, it use the keyword INCLUDING/EXCLUDING to flexibly control > the removal and retention of indexes, table information, etc. see pg > create table like > <https://www.postgresql.org/docs/8.1/sql-createtable.html> , the new > created index name is different from the original table's index name , > seenewly > copied index names are different from original > <https://github.com/postgres/postgres/blob/master/doc/src/sgml/ref/create_table.sgml#L749> > , the name is based on some rule. > Mysql is relatively simple and copies columns and indexes by default. see > mysql > create table like > <https://dev.mysql.com/doc/refman/8.4/en/create-table-like.html> and the > newly created index name is the same with the original table's index name. > > So for Casandra, I hope it can also support the information copy of index > and even view/trigger. And I also hope to be able to flexibly decide which > information is copied like pg. > > Besides, I think the copy can happen between different keyspaces. And UDT > needs to be taken into account. > > But as we know the index/view/trigger name are all under keyspace level, > so it seems that the newly created index name (or view name/ trigger name) > must be different from the original tables' ,otherwise names would clash . > > So regarding the above problem, one idea I have is that for newly created > types, indexes and views under different keyspaces and the same keyspace, > we first generate random names for them, and then we can add the ability of > modifying the names(for types/indexes/views/triggers) so that users can > manually change the names. > > > guo Maxwell <cclive1...@gmail.com> 于2024年9月20日周五 08:06写道: > > No,I think still need some discuss on grammar detail after I finish the > first version > > Patrick McFadin <pmcfa...@gmail.com>于2024年9月20日 周五上午2:24写道: > > Is this CEP ready for a VOTE thread? > > On Sat, Aug 24, 2024 at 8:56 PM guo Maxwell <cclive1...@gmail.com> wrote: > > Thank you for your replies, I will prepare a CEP later. > > Patrick McFadin <pmcfa...@gmail.com> 于2024年8月20日周二 02:11写道: > > +1 This is a CEP > > On Mon, Aug 19, 2024 at 10:50 AM Jon Haddad <j...@jonhaddad.com> wrote: > > Given the fairly large surface area for this, i think it should be a CEP. > > — > Jon Haddad > Rustyrazorblade Consulting > rustyrazorblade.com > > > On Mon, Aug 19, 2024 at 10:44 AM Bernardo Botella < > conta...@bernardobotella.com> wrote: > > Definitely a nice addition to CQL. > > Looking for inspiration at how Postgres and Mysql do that may also help > with the final design (I like the WITH proposed by Stefan, but I would > definitely take a look at the INCLUDING keyword proposed by Postgres). > https://www.postgresql.org/docs/current/sql-createtable.html > https://dev.mysql.com/doc/refman/8.4/en/create-table-like.html > > On top of that, and as part of the interesting questions, I would like to > add the permissions to the mix. Both the question about copying them over > (with a WITH keyword probably), and the need for read permissions on the > source table as well. > > Bernardo > > > On Aug 19, 2024, at 10:01 AM, Štefan Miklošovič <smikloso...@apache.org> > wrote: > > BTW this would be cool to do as well: > > ALTER TABLE ks.to_copy LIKE ks.tb WITH INDICES; > > This would mean that if we create a copy of a table, later we can decide > that we need indices too, so we might "enrich" that table with indices from > the old one without necessarily explicitly re-creating them on that new > table. > > On Mon, Aug 19, 2024 at 6:55 PM Štefan Miklošovič <smikloso...@apache.org> > wrote: > > I think this is an interesting idea worth exploring. I definitely agree > with Benjamin who raised important questions which needs to be answered > first. Also, what about triggers? > > It might be rather "easy" to come up with something simple but it should > be a comprehensive solution with predictable behavior we all agree on. > > If a keyspace of a new table does not exist we would need to create that > one too before. For the simplicity, I would just make it a must to create > it on same keyspace. We might iterate on that in the future. > > UDTs are created per keyspace so there is nothing to re-create. We just > need to reference it from a new table, right? > > Indexes and MVs are interesting but in theory they might be re-created too. > > Would it be appropriate to use something like this? > > CREATE TABLE ks.tb_copy LIKE ks.tb WITH INDEXES AND VIEWS AND TRIGGERS > .... > > Without "WITH" it would just copy a table with nothing else. > > On Mon, Aug 19, 2024 at 6:10 PM guo Maxwell <cclive1...@gmail.com> wrote: > > Hello, everyone: > As Jira CASSANDRA-7662 > <https://issues.apache.org/jira/browse/CASSANDRA-7662> has described , we > would like to introduce a new grammer " CREATE TABLE LIKE " > ,which simplifies creating new tables duplicating the existing ones . > The format may be like : CREATE TABLE <new_table> LIKE <old_table> > Before I implement this function, do you have any suggestions on this? > > Looking forward to your reply! > > > > > -- > -Dave > > David A. Herrington II > President and Chief Engineer > RhinoSource, Inc. > > *Data Lake Architecture, Cloud Computing and Advanced Analytics.* > > www.rhinosource.com > > > > -- > -Dave > > David A. Herrington II > President and Chief Engineer > RhinoSource, Inc. > > *Data Lake Architecture, Cloud Computing and Advanced Analytics.* > > www.rhinosource.com > > > > -- > -Dave > > David A. Herrington II > President and Chief Engineer > RhinoSource, Inc. > > *Data Lake Architecture, Cloud Computing and Advanced Analytics.* > > www.rhinosource.com > > > > -- > -Dave > > David A. Herrington II > President and Chief Engineer > RhinoSource, Inc. > > *Data Lake Architecture, Cloud Computing and Advanced Analytics.* > > www.rhinosource.com > > >