Dan Burkert has posted comments on this change. Change subject: add support for persisting Spark DataFrames to Kudu ......................................................................
Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/2969/1/java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala: Line 29: import org.kududb.client.{PartialRow, _} Could you list them out explicitly? Line 103: val session: KuduSession = client.newSession Put the session into batch mode by calling "session.setFlushMode(FlushMode.AUTO_FLUSH_BACKGROUND)". Without this the session will flush each insert individually, which really hurts performance. Line 125: session.close() This should probably be in a finally block just in case. I don't think anything above can throw, but it's hard to know. Also, check the returned OperationResponses to make sure there aren't any errors. I'm not sure what Spark SQL's method for handling errors are, maybe an exception? -- To view, visit http://gerrit.cloudera.org:8080/2969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icdb58f4e707fa273ae50b93276c426ad77522e3b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andy Grove <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
