Alex Van Boxel created BEAM-7999:
------------------------------------
Summary: BigQueryIO.readTableRowsWithSchema() doesn't handle
timestamp correctly
Key: BEAM-7999
URL: https://issues.apache.org/jira/browse/BEAM-7999
Project: Beam
Issue Type: Task
Components: io-java-gcp
Affects Versions: 2.14.0, 2.15.0
Reporter: Alex Van Boxel
Assignee: Alex Van Boxel
Using the new readTableRowsWithSchema to make a copy of a table (simple
operation), parsing the timestamp in the table doesn't work as it assumes a
Double value. BigQuery outputs a string like "2019-08-16 00:12:00.123456 UTC".
This isn't handled.
*Reproducable:*
with this table
{code:java}
INSERT `research.alex.in1` (row_id, f_int64, f_timestamp)
VALUES
(1, 1, '2019-08-16 00:12:00 UTC'),
(2, 2, '2019-08-16 00:12:00.123 UTC'),
(3, 3, '2019-08-16 00:12:00.123456 UTC')
{code}
do a copy operation:
{code:java}
pipeline
.apply(
BigQueryIO.readTableRowsWithSchema()
.from("research:alex.in1")
//.withMethod(BigQueryIO.TypedRead.Method.DIRECT_READ)
)
.apply(ParDo.of(new Inspect()))
.apply(
BigQueryIO.writeTableRows()
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withMethod(BigQueryIO.Write.Method.FILE_LOADS)
.useBeamSchema()
.to("research:alex.out4"));
{code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)