Re: [BigQuery] TableRowJsonCoder question

Reuven Lax Thu, 14 Jun 2018 10:05:58 -0700

I think Thomas Groh hit this issue and might know a workaround.

In general, TableRowJsonCoder has been a huge pain, partially because Json
itself cannot always represent all types (numeric types are a constant
source of trouble in Json). In addition, I've found that encoding all data
into Json (which is space inefficient) is quite expensive when shuffling
that data (and bigQueryIO does do a GroupByKey on TableRows). I'm working
on a PR that will extract schema information and allow BigQueryIO to use
SchemaCoder instead of TableRowJsonCoder, however this is not quite ready
to be merged yet.


Reuven

On Wed, Jun 13, 2018 at 1:54 AM Etienne Chauchot <[email protected]>
wrote:

> Hi all,
>
> While playing with BigQueryIO I noticed something.
>
> When we create a TableRow (e.g. in a row function in bigQueryIO) using new
> TableRow().set(), for ex a long gets boxed into a Long. But when it is
> encoded using TableRowJsonCoder and then re-read it might be decoded as an
> Integer if the value fits into Integer. It causes failure in asserts in
> tests like write then read.
> What I did for now is to downcast long to int to force it to be boxed into
> an Integer (because test value fits into Integer) at TableRow creation.
>
> Is there a way to fix it in TableRowJsonCoder or a better workaround?
>
> Etienne
>

Re: [BigQuery] TableRowJsonCoder question

Reply via email to