[jira] [Commented] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

ASF subversion and git services (Jira) Thu, 26 Jan 2023 08:33:06 -0800


    [ 
https://issues.apache.org/jira/browse/KUDU-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681033#comment-17681033
 ]


ASF subversion and git services commented on KUDU-1945:
-------------------------------------------------------

Commit 345fd44ca3b0ecd6e80b4e554cac0d8cb230ccd7 in kudu's branch 
refs/heads/master from Marton Greber
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=345fd44ca ]

KUDU-1945 Auto-Incrementing Column, C++ client

This patch adds the initial part of the cpp client side changes to the
auto incrementing column feature.

A new KuduColumnSpec called NonUniquePrimaryKey is added. Semantically
it behaves like PrimaryKey:
- only one column can have the NonUniquePrimaryKey ColumnSpec in a given
  KuduSchemaBuilder context,
- if it exists, it must be defined in the first place,
- compound keys are defined through a set function.
Functionally non-unique primary keys don't need to fulfill the
uniqueness constraint. An auto incrementing column is added in the
background automatically once a non-unique primary key is specified. The
non-unique keys and the auto incrementing columns together form the
effective primary key.

Some technical notes:
- The name of the auto incrementing column is hardcoded into the Schema
  class. This is a reserved column name, users can't create columns with
  it. On the client facing side, this reserved string is reachable
  through: KuduSchema::GetAutoIncrementingColumnName().
- In this initial version there is no support for UPSERT and
  UPSERT_IGNORE operations.

As suggested in the server side patch [1], a specific builder is added
to construct the column spec for the auto incrementing column.  Since
this pins down the properties of the column, I took the liberty to
remove unit tests and checks which verify these properties.

[1] https://gerrit.cloudera.org/#/c/19097/

Change-Id: Ic133e3d44cc56c8351e33d95b523ed7b6b13617b
Reviewed-on: http://gerrit.cloudera.org:8080/19272
Reviewed-by: Alexey Serbin <[email protected]>
Reviewed-by: Abhishek Chennaka <[email protected]>
Tested-by: Alexey Serbin <[email protected]>


> Support generation of surrogate primary keys (or tables with no PK)
> -------------------------------------------------------------------
>
>                 Key: KUDU-1945
>                 URL: https://issues.apache.org/jira/browse/KUDU-1945
>             Project: Kudu
>          Issue Type: New Feature
>          Components: client, master, tablet
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: roadmap-candidate
>
> Many use cases have data where there is no "natural" primary key. For 
> example, a web log use case mostly cares about partitioning and not about 
> precise sorting by timestamp, and timestamps themselves are not necessarily 
> unique. Rather than forcing users to come up with their own surrogate primary 
> keys, Kudu should support some kind of "auto_increment" equivalent which 
> generates primary keys on insertion. Alternatively, Kudu could support tables 
> which are partitioned but not internally sorted.
> The advantages would be:
> - Kudu can pick primary keys on insertion to guarantee that there is no 
> compaction required on the table (eg always assign a new key higher than any 
> existing key in the local tablet). This can improve write throughput 
> substantially, especially compared to naive PK generation schemes that a user 
> might pick such as UUID, which would generate a uniform random-insert 
> workload (worst case for performance)
> - Make Kudu easier to use for such use cases (no extra client code necessary)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

Reply via email to