I've dug the topic a bit and I think the 2nd approach will fit better. The 
schema in Bigtable is not supposed to change that often and specifying our own 
schema is more SQL-like and will cause less potential trouble.

On 2020/11/03 11:01:57, Piotr Szuberski <piotr.szuber...@polidea.com> wrote: 
> I'm going to write Bigtable table for BeamSQL and I have a question about the 
> schema design, which one would be preferrable.
> 
> Bigtable stores its data in a table with rows that contain a key and 
> 3-dimensional array where the 1st dimension is families with a names, 2nd 
> dimension is columns with qualifiers and the 3rd cells containing timestamp 
> and value.
> 
> Two design solutions come to mind:
> 1) Fix schema to be a generic Bigtable row:
> 
> Row(key, Array(Row(family, Array(Row(qualifier, Array(Row(value, 
> timestamp)))))))
> 
> Then the table creation definition would always be in form:
> 
> CREATE TABLE bigtableexample1()
> TYPE 'bigtable'
> LOCATION 
> 'https://googleapis.com/bigtable/projects/projectId/instances/instanceId/tables/tableId'
> 
> 2) Let the user design his schema by providing the desired families and 
> columns it sth like:
> CREATE TABLE bigtableexample2(
>   key VARCHAR,
>   family1 ROW<
>     column1 ROW<
>       cells ARRAY<ROW<
>         value VARCHAR,
>         timestamp BIGINT
>       >>
>     >,
>     column2 ROW<
>       cells ARRAY<ROW<
>         value VARCHAR,
>         timestamp BIGINT
>       >>
>     >
>   >
> )
> TYPE 'bigtable'
> LOCATION 
> 'https://googleapis.com/bigtable/projects/projectId/instances/instanceId/tables/tableId'
> 
> For me the 1st approach is more user friendly (typing schema from the 2nd 
> would be troublesome) and more elastic especially when the row's schema 
> (families and columns) changes and a user wants to perform 'SELECT * from 
> bigtableexampleX'.
> 
> WDYT? I'd welcome any feedback. Maybe there is some 3rd option that will be a 
> better one?
> 

Reply via email to