Hey Andy, I think you may have solved this already but the way to apply an update after-the-fact to a prior patch: (Assuming you have patch 1, 2, and 3 where patch 3 is logically part of the change in 1.) You do an interactive git rebase (git rebase -i) and squash the 3rd patch into the first patch (making sure to retain the Commit-id of the first patch in the git commit message). After the rebase you are left with only 2 commits again. Then you just repush both commits to Gerrit. Actually, it is possible to only repush commit 1 to Gerrit in this case (git checkout commit-1-hash; push to gerrit; git checkout branch-name).
HTH, Mike On Fri, May 6, 2016 at 6:47 AM, Andy Grove <[email protected]> wrote: > Mike, > > Thanks. I have amended the original commit to assume the table already > exists and have mostly addressed Dan's feedback, and then created a second > commit to add deleteTable/createTable methods. I made CreateTableOptions a > parameter to createTable but I expect more may be required. > > After doing all of this I realized that I did not address one of Dan's > comments on the first commit and I'm not sure how to address that with > Gerrit so I'll go ahead and issue a third commit to improve data type > coverage later today. > > I'd like to add some tests too, so I'm working on getting the test suite to > run at the moment. > > Thanks, > > Andy. > > On Thu, May 5, 2016 at 9:20 PM, Mike Percy <[email protected]> wrote: > > > Hey Andy, thanks for posting your code to Gerrit. It's great to see more > > work progressing on the Spark side of things! > > > > Some other stuff you guys might want to consider for the create table > > feature, in addition to the partitioning details, is the ability to > specify > > replication factor. In fact, one might want to specify any of the stuff > in > > the CreateTableRequestPB > > < > > > https://github.com/apache/incubator-kudu/blob/master/src/kudu/master/master.proto#L337 > > > > > : > > > > message CreateTableRequestPB { > > required string name = 1; > > required SchemaPB schema = 2; > > optional RowOperationsPB split_rows = 6; > > optional PartitionSchemaPB partition_schema = 7; > > optional int32 num_replicas = 4; > > } > > > > Ideally you could even specify encodings for the columns (dictionary > > encoding, etc) as part of the schema. > > > > All of this does seem orthogonal to the DataFrame writing capability, > which > > is probably more straightforward in terms of features. Consider treating > > these as two separate patches: one that assumes the table already exists, > > and one that adds the table creation capability. Gerrit will treat two > > separate commits as two separate code reviews, even if one depends on the > > other (it can track that). Doing that will make it easier to review and > > probably faster to get the different parts of it committed. > > > > Mike > > > > > > -- > > Mike Percy > > Software Engineer, Cloudera > > > > > > On Thu, May 5, 2016 at 7:52 PM, Andy Grove <[email protected]> > wrote: > > > > > Hi Dan, > > > > > > Thanks for the feedback. I'm thinking that the PK could be specified by > > > passing in a tuple of column indices to use for the key (assuming that > > > composite keys are supported?). > > > > > > Maybe it would be useful for KuduContext to have createTable(name: > > String, > > > schema: StructType) and dropTable(name: String) methods that can > > eventually > > > be wrapped in Spark SQL? > > > > > > I'm just getting up to speed with Kudu by trying to run TPC-H via Spark > > and > > > the use case for driving the current code was being able to take > existing > > > DataFrames (created by reading CSV files) and save them to Kudu so that > > we > > > can then start testing the TPC-H SQL queries. > > > > > > We will be experimenting with both Spark SQL and own own SQL parser / > > query > > > planner for Spark, that we are planning on open sourcing too. > > > > > > Thanks, > > > > > > Andy. > > > > > > On Thu, May 5, 2016 at 7:16 PM, Dan Burkert <[email protected]> wrote: > > > > > > > Hey Andy, > > > > > > > > Thanks for the patch! I left you some specific feedback in on the > > gerrit > > > > review, but I want to discuss the high level approach a bit. I think > > the > > > > patch as it's written now is going to have limited use, because it > > > doesn't > > > > allow for specifying primary keys or partitioning, which are critical > > for > > > > correctness and performance. In the long run we will definitely want > to > > > be > > > > able to create tables through Spark SQL, but perhaps we should start > of > > > > with just inserting/updating rows in existing tables. It would be > > > > interesting to see how other databases solved this problem, since I'm > > > sure > > > > we're not the only ones with configuration options on table create. > > The > > > > relational databases in particular must have PK options. > > > > > > > > - Dan > > > > > > > > On Thu, May 5, 2016 at 5:51 PM, Andy Grove <[email protected]> > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I'm working with some colleagues at AgilData on Spark/Kudu > > integration > > > > and > > > > > we expect to be able to contribute a number of features to the code > > > base. > > > > > > > > > > To kick things off, here is a gerrit for discussion that adds > support > > > for > > > > > persisting a DataFrame to a Kudu table. It would be great to hear > > > > feedback > > > > > and feature requests for this capability. > > > > > > > > > > http://gerrit.cloudera.org:8080/#/c/2969/ > > > > > > > > > > Thanks, > > > > > > > > > > Andy. > > > > > > > > > > > > > > >
