Re: How to write NaN using BigQuerySink in Python?

Asha Rostamianfar Mon, 18 Sep 2017 11:07:07 -0700

Thanks for the quick response, Cham.

In my use case (supporting the VCF
<https://samtools.github.io/hts-specs/VCFv4.2.pdf> format), each value in
the repeated sequence has an associated context. In other words, the index
of the values is important for determining its context and some values may
be null, so [0, None, 1] is different from [None, 0, 1]. Having a generic
'default' value is also not ideal as the context may change between fields
(unless we use something like sys.maxint). Your suggestion of using a
repeated records would work, but it has the drawback of complicating the
schema.


Anyhow, it doesn't seem like there is an easy solution, but please let me
know if you have any other thoughts on this.

Thanks again,
Asha

On Mon, Sep 18, 2017 at 1:41 PM, Chamikara Jayalath <chamik...@apache.org>
wrote:

> NaN and Inf values are not JSON compliant and hence not supported.  We use
> JSON BigQuery load when writing to BigQuery using DataflowRunner.
> https://github.com/apache/beam/blob/master/sdks/python/
> apache_beam/io/gcp/bigquery.py#L155
>
> Other values including 'None' are supported. Why do you need to record
> 'None' values for an repeated integer field ? Can you update the table
> schema to support your use-case ? For example,
>
> * maintaining a count of None values in a separate filed
> * defining a repeated field for a record type with one nullable field
>
> - Cham
>
>
>
> On Mon, Sep 18, 2017 at 10:08 AM Asha Rostamianfar
> <arost...@google.com.invalid> wrote:
>
> > Is there a way to write 'NaN' to BigQuery using the
> > Python beam.io.BigQuerySink?
> >
> > It complains that NaN is not supported in JSON if I try using
> float('NaN').
> >
> > Context: given that null values are not supported in repeated fields for
> > BigQuery (e.g. having [0, None, 1]), I like to find a way to represent
> > 'None' values for numeric types. I thought using NaN may be a good
> > workaround if possible. Any 'special' value would work for this purpose
> > actually.
> >
> > Thanks,
> > Asha
> >
>

Re: How to write NaN using BigQuerySink in Python?

Reply via email to