[ https://issues.apache.org/jira/browse/KUDU-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411550#comment-16411550 ]
Dan Burkert commented on KUDU-2250: ----------------------------------- This issue comes up quite a bit. We could make this easier by adding a flag to the Kudu/Spark integration which will cause the UPSERT/UPDATE operations to skip null values. > Document odd interaction between upserts and Spark Datasets > ----------------------------------------------------------- > > Key: KUDU-2250 > URL: https://issues.apache.org/jira/browse/KUDU-2250 > Project: Kudu > Issue Type: Task > Components: spark > Affects Versions: 1.6.0 > Reporter: Jean-Daniel Cryans > Assignee: Fengling Wang > Priority: Major > Labels: newbie > > We need to document a specific behavior of Spark Datasets that runs contrary > to how Kudu works. > Say you have 3 columns "k, x, y" where k is the primary key. > You run a first insert on a row "k=1, x=2, y=3". > Now you upsert "k=1, y=4". > Using any Kudu API, the full row would now be "k=1, x=2, y=4" but with > Datasets you have "k=1, x=*NULL*, y=4". This means that Datasets put a null > value when some columns aren't specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005)