Thanks Reuven, Using SchemaCoder is better indeed to avoid loosing the type
information.
Etienne
Le jeudi 14 juin 2018 à 10:04 -0700, Reuven Lax a écrit :
> I think Thomas Groh hit this issue and might know a workaround.
> In general, TableRowJsonCoder has been a huge pain, partially because Json
> itself cannot always represent all types
> (numeric types are a constant source of trouble in Json). In addition, I've
> found that encoding all data into Json
> (which is space inefficient) is quite expensive when shuffling that data (and
> bigQueryIO does do a GroupByKey on
> TableRows). I'm working on a PR that will extract schema information and
> allow BigQueryIO to use SchemaCoder instead
> of TableRowJsonCoder, however this is not quite ready to be merged yet.
>
> Reuven
> On Wed, Jun 13, 2018 at 1:54 AM Etienne Chauchot <echauc...@apache.org> wrote:
> > Hi all,
> >
> > While playing with BigQueryIO I noticed something.
> >
> > When we create a TableRow (e.g. in a row function in bigQueryIO) using new
> > TableRow().set(), for ex a long gets
> > boxed into a Long. But when it is encoded using TableRowJsonCoder and then
> > re-read it might be decoded as an Integer
> > if the value fits into Integer. It causes failure in asserts in tests like
> > write then read.
> > What I did for now is to downcast long to int to force it to be boxed into
> > an Integer (because test value fits into
> > Integer) at TableRow creation.
> >
> > Is there a way to fix it in TableRowJsonCoder or a better workaround?
> >
> > Etienne