Dan Burkert has posted comments on this change. Change subject: Kudu Spark Datasource for insertions and updates ......................................................................
Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/2992/4/java/kudu-spark/src/main/scala/org/kududb/spark/kudu/DefaultSource.scala File java/kudu-spark/src/main/scala/org/kududb/spark/kudu/DefaultSource.scala: Line 87: new KuduRelation(tableName, kuduMaster)(sqlContext) Should this be returning the same KuduRelation created above on line 77? Line 108: @transient Neither of these @transient tags should be necessary, since KuduRelation should not be serialized. Line 142: val columns = requiredColumns.intersect(primaryKeys) ++ (requiredColumns diff primaryKeys) //kudu requires the primary key to be the first columns retrieved The column ordering requirement has been fixed as of https://github.com/apache/incubator-kudu/commit/7d57b8b9204a0b170fc204cb7aaa98ad995cf635 http://gerrit.cloudera.org:8080/#/c/2992/4/java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala: Line 91: def tableExists(tableName: String) = syncClient.tableExists(tableName) Is this returning a boolean? I think return type is mandatory in Scala. Line 98: def deleteTable(tableName: String) = syncClient.deleteTable(tableName) same here Line 110: options: CreateTableOptions): Unit = { Unit is redundant Line 156: tableName: String, indentation Line 160: val columns = (table.getSchema.getColumns.asScala).map(_.getName()) // gets the columns in order so the primary key is first This shouldn't be necessary anymore. I think the most efficient thing to do is to set the Kudu PartialRow fields by index instead of by column name, this will save a bunch of internal hash map lookups which you could do once at the start. -- To view, visit http://gerrit.cloudera.org:8080/2992 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I74f20a6c17c47f424dfda62854963dc19c3b78c3 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Chris George <[email protected]> Gerrit-Reviewer: Andy Grove <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
