Sorry for the delay. I had some issues updating the schema, I ended up having to drop it and re-create for some reason. Looks like SQL PostCommit is green on https://github.com/apache/beam/pull/10765 now.
> setting up from scratch is a good idea. +1, I filed https://issues.apache.org/jira/browse/BEAM-9260 for this and assigned it to Kenn for now. > I’m still feeling it’s ok to read an Integer as Long. In this case the issue is that we were reading data with a schema (id: INT64), and then passing it to a PAssert that checked against Rows with a different schema (id: INT32). So I think it's a legitimate error because we were asserting that a PCollection contained rows with one schema but it actually had a different schema. The message should definitely be better in this case though. Brian On Wed, Feb 5, 2020 at 9:28 PM Tomo Suzuki <suzt...@google.com> wrote: > (My understanding) > The test ensures the CSV data stored in GCS should be readable through > Datacatalog. It fails because an Integer value in the CSV was read as Long > as per Datacatalog. > > > > setting up from scratch is a good idea. > > I agree. Furthermore, it would be nice if it can test different type-cast > behaviors. I’m still feeling it’s ok to read an Integer as Long. (If this > is the case, how about Long to Integer? What if the long is small enough to > fit in 32 bits? and so on) > > On Wed, Feb 5, 2020 at 23:15 Kenneth Knowles <k...@apache.org> wrote: > >> I think that was me... sorry! >> >> Is this a test where it is important that the data is pre-existing? >> Otherwise I would say that setting up from scratch is a good idea. Does >> anyone have context on it? I am happy to take on the small bit of coding, >> since I broke it. >> >> Kenn >> >> On Wed, Feb 5, 2020 at 1:22 PM Brian Hulette <bhule...@google.com> wrote: >> >>> So it looks like the schema for `integ_test_small_csv_test_1` was >>> updated yesterday around the same time that PR#10563 went in, and it no >>> longer matches the schema we expect in the test. >>> >>> I'm just going to change it back for now. I am curious who changed it >>> and why, if the perpetrator is on this list please let us know :) >>> >>> >>> Note the updateTime: >>> ``` >>> ❯ gcloud beta data-catalog entries lookup >>> '`datacatalog`.`entry`.`apache-beam-testing`.`us-central1`.`samples`.`integ_test_small_csv_test_1`'`` >>> gcsFilesetSpec: >>> filePatterns: >>> - gs://apache-beam-samples/integration_test_small_csv/test.csv >>> linkedResource: // >>> datacatalog.googleapis.com/projects/apache-beam-testing/locations/us-central1/entryGroups/samples/entries/integ_test_small_csv_test_1 >>> name: >>> projects/apache-beam-testing/locations/us-central1/entryGroups/samples/entries/integ_test_small_csv_test_1 >>> schema: >>> columns: >>> - column: id >>> mode: NULLABLE >>> type: INT64 >>> - column: name >>> mode: NULLABLE >>> type: STRING >>> - column: type >>> mode: NULLABLE >>> type: STRING >>> sourceSystemTimestamps: >>> createTime: '2019-08-16T01:49:06.235Z' >>> updateTime: '2020-02-04T17:18:17.671Z' >>> type: FILESET >>> ``` >>> >> -- > Regards, > Tomo >