> I don't think BigQuery offers a way to automatically create datasets when
writing.

This exactly, and it also makes sense from the standpoint of the BQ
permissions model. In BigQuery, table creation requires that the service
account have privileges at the dataset level, but dataset creation requires
that the service account is essentially an administrator for the entire
project. Given that model, I don't expect dynamic dataset creation to be a
common enough use case to warrant special support in BigQueryIO. It would
certainly be interesting to hear from others with such needs, though.

I concur with Chamikara that it likely makes sense to perform this in a
ParDo before your BigQueryIO.write transform. You'd probably want to list
datasets in the project on startup, and then for any message that comes in
needing a new dataset, make the appropriate API call using the BigQuery
SDK, and update the cache. You'd likely need to provide error handling to
handle the race condition where a different node has already successfully
created the dataset.

On Wed, Dec 2, 2020 at 10:27 PM Chamikara Jayalath <[email protected]>
wrote:

> The functionality does not come from BigQueryIO itself but it just exposes
> existing BigQuery feature CreateDisposition [1]. I don't think BigQuery
> offers a way to automatically create datasets when writing.
> Is it possible to create such Datasets from a ParDo in your pipeline that
> precedes BigQueryIO write transform ?
>
> Thanks,
> Cham
>
> [1]
> https://cloud.google.com/bigquery/docs/reference/auditlogs/rest/Shared.Types/CreateDisposition
>
> On Wed, Dec 2, 2020 at 3:20 PM Vasu Gupta <[email protected]> wrote:
>
>> Hey folks, why isn't there any capability of creating datasets
>> automatically just like tables in BigQueryIO? Actually, we at our company
>> have dynamic dataset architecture which means as the packet arrives, we
>> need to create new datasets and tables on the go. Since BigQueryIO already
>> have functionality of creating tables automatically so we were thinking
>> that why not a similar functionality for dataset can be implemented in
>> BigQueryIO.
>>
>

Reply via email to