Bigtable for BeamSQL - question about the schema design

Piotr Szuberski Tue, 03 Nov 2020 03:02:50 -0800

I'm going to write Bigtable table for BeamSQL and I have a question about the 
schema design, which one would be preferrable.


Bigtable stores its data in a table with rows that contain a key and 
3-dimensional array where the 1st dimension is families with a names, 2nd 
dimension is columns with qualifiers and the 3rd cells containing timestamp and 
value.

Two design solutions come to mind:
1) Fix schema to be a generic Bigtable row:

Row(key, Array(Row(family, Array(Row(qualifier, Array(Row(value, 
timestamp)))))))

Then the table creation definition would always be in form:

CREATE TABLE bigtableexample1()
TYPE 'bigtable'
LOCATION 
'https://googleapis.com/bigtable/projects/projectId/instances/instanceId/tables/tableId'

2) Let the user design his schema by providing the desired families and columns 
it sth like:
CREATE TABLE bigtableexample2(
  key VARCHAR,
  family1 ROW<
    column1 ROW<
      cells ARRAY<ROW<
        value VARCHAR,
        timestamp BIGINT
      >>
    >,
    column2 ROW<
      cells ARRAY<ROW<
        value VARCHAR,
        timestamp BIGINT
      >>
    >
  >
)
TYPE 'bigtable'
LOCATION 
'https://googleapis.com/bigtable/projects/projectId/instances/instanceId/tables/tableId'

For me the 1st approach is more user friendly (typing schema from the 2nd would 
be troublesome) and more elastic especially when the row's schema (families and 
columns) changes and a user wants to perform 'SELECT * from bigtableexampleX'.

WDYT? I'd welcome any feedback. Maybe there is some 3rd option that will be a 
better one?

Bigtable for BeamSQL - question about the schema design

Reply via email to