I've dug the topic a bit and I think the 2nd approach will fit better. The schema in Bigtable is not supposed to change that often and specifying our own schema is more SQL-like and will cause less potential trouble.
On 2020/11/03 11:01:57, Piotr Szuberski <piotr.szuber...@polidea.com> wrote: > I'm going to write Bigtable table for BeamSQL and I have a question about the > schema design, which one would be preferrable. > > Bigtable stores its data in a table with rows that contain a key and > 3-dimensional array where the 1st dimension is families with a names, 2nd > dimension is columns with qualifiers and the 3rd cells containing timestamp > and value. > > Two design solutions come to mind: > 1) Fix schema to be a generic Bigtable row: > > Row(key, Array(Row(family, Array(Row(qualifier, Array(Row(value, > timestamp))))))) > > Then the table creation definition would always be in form: > > CREATE TABLE bigtableexample1() > TYPE 'bigtable' > LOCATION > 'https://googleapis.com/bigtable/projects/projectId/instances/instanceId/tables/tableId' > > 2) Let the user design his schema by providing the desired families and > columns it sth like: > CREATE TABLE bigtableexample2( > key VARCHAR, > family1 ROW< > column1 ROW< > cells ARRAY<ROW< > value VARCHAR, > timestamp BIGINT > >> > >, > column2 ROW< > cells ARRAY<ROW< > value VARCHAR, > timestamp BIGINT > >> > > > > > ) > TYPE 'bigtable' > LOCATION > 'https://googleapis.com/bigtable/projects/projectId/instances/instanceId/tables/tableId' > > For me the 1st approach is more user friendly (typing schema from the 2nd > would be troublesome) and more elastic especially when the row's schema > (families and columns) changes and a user wants to perform 'SELECT * from > bigtableexampleX'. > > WDYT? I'd welcome any feedback. Maybe there is some 3rd option that will be a > better one? >