Re: [BigQuery] TableRowJsonCoder question

Etienne Chauchot Thu, 14 Jun 2018 10:51:40 -0700
Thanks Reuven, Using SchemaCoder is better indeed to avoid loosing the type 
information.
Etienne
Le jeudi 14 juin 2018 à 10:04 -0700, Reuven Lax a écrit :
> I think Thomas Groh hit this issue and might know a workaround.
> In general, TableRowJsonCoder has been a huge pain, partially because Json 
> itself cannot always represent all types
> (numeric types are a constant source of trouble in Json). In addition, I've 
> found that encoding all data into Json
> (which is space inefficient) is quite expensive when shuffling that data (and 
> bigQueryIO does do a GroupByKey on
> TableRows). I'm working on a PR that will extract schema information and 
> allow BigQueryIO to use SchemaCoder instead
> of TableRowJsonCoder, however this is not quite ready to be merged yet.
> 
> Reuven
> On Wed, Jun 13, 2018 at 1:54 AM Etienne Chauchot <echauc...@apache.org> wrote:
> > Hi all,
> > 
> > While playing with BigQueryIO I noticed something. 
> > 
> > When we create a TableRow (e.g. in a row function in bigQueryIO) using new 
> > TableRow().set(), for ex a long gets
> > boxed into a Long. But when it is encoded using TableRowJsonCoder and then 
> > re-read it might be decoded as an Integer
> > if the value fits into Integer. It causes failure in asserts in tests like 
> > write then read. 
> > What I did for now is to downcast long to int to force it to be boxed into 
> > an Integer (because test value fits into
> > Integer) at TableRow creation.
> > 
> > Is there a way to fix it in TableRowJsonCoder or a better workaround?
> > 
> > Etienne
Re: [BigQuery] TableRowJsonCoder question

Reply via email to