[ 
https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingchun Lai updated KUDU-3353:
-------------------------------
    Description: 
h1. motivation

In some usage scenarios, Kudu table has a column with semantic of "create 
time", which means it represent the create timestamp of the row. The other 
columns have the similar semantic as before, for example, the user properties 
like age, address, and etc.

Upstream and Kudu user doesn't know whether a row is exist or not, and every 
cell data is the lastest ingested from, for example, event stream.

If without the "create time" column, Kudu user can use UPSERT operations to 
write data to the table, every columns with data will overwrite the old data. 
But if with the "create time" column, the cell data will be overwrote by the 
following UPSERT ops, which is not what we expect.

To achive the goal, we have to read the column out to judge whether the column 
is NULL or not, if it's NULL, we can fill the row with the cell, if not NULL, 
we will drop it from the data before UPSERT, to avoid overwite "create time".

It's expensive, is there a way to avoid a read from Kudu?
h1. Resolvation

We can implement column schema with semantic of "update if null". That means 
cell data in changelist will update the base data if the latter is NULL, and 
will ignore updates if it is not NULL.

So we can use Kudu similarly as before, but only defined the column as "update 
if null" when create table or add column.

 

  was:
h1. motivation

In some usage scenarios, Kudu table has a column with semantic of "create 
time", which means it represent the create timestamp of the row. The other 
columns have the similar semantic as before, for example, the user properties 
like age, address, and etc.

Upstream and Kudu user doesn't know whether a row is exist or not, and every 
cell data is the lastest ingested from, for example, event stream.

If without the "create time" column, Kudu user can use UPSERT operations to 
write data to the table, every columns with data will overwrite the old data. 
But if with the "create time" column, the cell data will be overwrote by the 
following UPSERT ops, which is not what we expect.

To achive the goal, we have to 


> Support setnx semantic on column
> --------------------------------
>
>                 Key: KUDU-3353
>                 URL: https://issues.apache.org/jira/browse/KUDU-3353
>             Project: Kudu
>          Issue Type: New Feature
>          Components: api, server
>            Reporter: Yingchun Lai
>            Priority: Major
>
> h1. motivation
> In some usage scenarios, Kudu table has a column with semantic of "create 
> time", which means it represent the create timestamp of the row. The other 
> columns have the similar semantic as before, for example, the user properties 
> like age, address, and etc.
> Upstream and Kudu user doesn't know whether a row is exist or not, and every 
> cell data is the lastest ingested from, for example, event stream.
> If without the "create time" column, Kudu user can use UPSERT operations to 
> write data to the table, every columns with data will overwrite the old data. 
> But if with the "create time" column, the cell data will be overwrote by the 
> following UPSERT ops, which is not what we expect.
> To achive the goal, we have to read the column out to judge whether the 
> column is NULL or not, if it's NULL, we can fill the row with the cell, if 
> not NULL, we will drop it from the data before UPSERT, to avoid overwite 
> "create time".
> It's expensive, is there a way to avoid a read from Kudu?
> h1. Resolvation
> We can implement column schema with semantic of "update if null". That means 
> cell data in changelist will update the base data if the latter is NULL, and 
> will ignore updates if it is not NULL.
> So we can use Kudu similarly as before, but only defined the column as 
> "update if null" when create table or add column.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to