[
https://issues.apache.org/jira/browse/KUDU-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Henke resolved KUDU-1227.
-------------------------------
Fix Version/s: NA
Resolution: Fixed
Marking as a resolved given UPSERT exists. We can open another jira to track
operators like SUM/MAX/APPEND_STRING/etc for UPDATES.
> Update with Merge Operation on Non-Key columns
> ----------------------------------------------
>
> Key: KUDU-1227
> URL: https://issues.apache.org/jira/browse/KUDU-1227
> Project: Kudu
> Issue Type: New Feature
> Components: client, tserver
> Reporter: Salvatore
> Priority: Trivial
> Fix For: NA
>
>
> It would be fantastic if the following use case could be rolled up into a
> single operation.
> Step 1: Client applies an Insert
> Step 2: Client receives OperationResponse with RowError with Status
> ALREADY_PRESENT
> Step 3: Client retrieves row for key columns specified in Step 1
> Step 4: Client locally merges non-key column values from Step 1, with non-key
> column values retrieved in Step 3, with some merge operation (Details and
> examples below)
> Step 5: Client applies an Update with merged values
> Step 6: Client receives OperationResponse with no RowError
> Merge operations details - I'm suggesting a few possible Merge operations,
> noting that all are associative (given the starting value already in the
> table is either present or null, and with the following operations in any
> order)
> So assuming a key of some unique identifier, or product code:
> SUM: Useful for counting the number of times this combination of key columns
> has been seen before
> MAX: Useful for setting timestamp values (newest), or highest price/value for
> an item seen
> MIN: Useful for setting timestamp values (oldest), or lowest price/value for
> an item seen
> SUB(TRACT): I haven't actually got a super useful use case for having a
> subtracing counter, unless you're wanting some sort of countdown or
> thresholding of scores (something happens when you reach zero, or negative
> score)
> Sample table, for example, might be one with four columns:
> STRING KEY unique_identifier, INT times_seen, TIMESTAMP first_seen, TIMESTAMP
> last_seen
> And streaming a set of unique_identifiers to be stored in a Kudu table as a
> lookup service, where the client could perform Operation along the lines of:
> Merge merge = table.newMerge("times_seen:ADD", "first_seen:MIN",
> "last_seen:MAX")
> and then setting the values in the PartialRow for this Operation with, for
> example:
> "abc", 2, 1445495695517000, 1445495708867000
> Which would result in one of two things -
> if key "abc" is not present in the table, it would simply be a plain insert
> OR
> If key "abc" is present in the table, 2 would be added to
> $times_seen_in_table column, first_seen column would be the result of
> min($first_seen_in_table, 1445495695517000) and last_seen would be the result
> of max($last_seen_in_table, 1445495708867000).
> So Ideally, the flow would be:
> Step 1: Client applies an Merge
> Step 2: OperationResponse is returned the client with no RowError. Might be
> good to have the OperationResponse saying whether it was plain insert, or the
> result of a merge, but that's not super necessary.
> This would save many, many failing inserts, gets, and updates back and forth
> between servers and clients on constantly updating datasets, really playing
> to Kudu's strength's even more.
> Constraints:
> For the merge operations, assuming that TServers are threadsafe for each key
> and apply these atomically, the operations must be associative; given a value
> in a table N, with two quick merges of values A and B:
> E.g. SUM: (N + A) + B = (N + B) + A
> or MIN/MAX: max(max(N, A), B) = max(max(N, B), A)
> or SUB: (N - A) - B = (N - B) - A (noting that N is always first operand)
> Another constraint would be that the Merge must contain values for all key
> columns, ensuring a single row is inserted/affected, although I suppose if an
> Insert was happening anyway, this would be true regardless.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)