Sorry for the delay. I had some issues updating the schema, I ended up
having to drop it and re-create for some reason. Looks like SQL PostCommit
is green on https://github.com/apache/beam/pull/10765 now.

> setting up from scratch is a good idea.
+1, I filed https://issues.apache.org/jira/browse/BEAM-9260 for this and
assigned it to Kenn for now.

> I’m still feeling it’s ok to read an Integer as Long.
In this case the issue is that we were reading data with a schema (id:
INT64), and then passing it to a PAssert that checked against Rows with a
different schema (id: INT32). So I think it's a legitimate error because we
were asserting that a PCollection contained rows with one schema but it
actually had a different schema. The message should definitely be better in
this case though.

Brian

On Wed, Feb 5, 2020 at 9:28 PM Tomo Suzuki <suzt...@google.com> wrote:

> (My understanding)
> The test ensures the CSV data stored in GCS should be readable through
> Datacatalog. It fails because an Integer value in the CSV was read as Long
> as per Datacatalog.
>
>
> > setting up from scratch is a good idea.
>
> I agree. Furthermore, it would be nice if it can test different type-cast
> behaviors. I’m still feeling it’s ok to read an Integer as Long. (If this
> is the case, how about Long to Integer? What if the long is small enough to
> fit in 32 bits? and so on)
>
> On Wed, Feb 5, 2020 at 23:15 Kenneth Knowles <k...@apache.org> wrote:
>
>> I think that was me... sorry!
>>
>> Is this a test where it is important that the data is pre-existing?
>> Otherwise I would say that setting up from scratch is a good idea. Does
>> anyone have context on it? I am happy to take on the small bit of coding,
>> since I broke it.
>>
>> Kenn
>>
>> On Wed, Feb 5, 2020 at 1:22 PM Brian Hulette <bhule...@google.com> wrote:
>>
>>> So it looks like the schema for `integ_test_small_csv_test_1` was
>>> updated yesterday around the same time that PR#10563 went in, and it no
>>> longer matches the schema we expect in the test.
>>>
>>> I'm just going to change it back for now. I am curious who changed it
>>> and why, if the perpetrator is on this list please let us know :)
>>>
>>>
>>> Note the updateTime:
>>> ```
>>> ❯ gcloud beta data-catalog entries lookup
>>> '`datacatalog`.`entry`.`apache-beam-testing`.`us-central1`.`samples`.`integ_test_small_csv_test_1`'``
>>> gcsFilesetSpec:
>>>   filePatterns:
>>>   - gs://apache-beam-samples/integration_test_small_csv/test.csv
>>> linkedResource: //
>>> datacatalog.googleapis.com/projects/apache-beam-testing/locations/us-central1/entryGroups/samples/entries/integ_test_small_csv_test_1
>>> name:
>>> projects/apache-beam-testing/locations/us-central1/entryGroups/samples/entries/integ_test_small_csv_test_1
>>> schema:
>>>   columns:
>>>   - column: id
>>>     mode: NULLABLE
>>>     type: INT64
>>>   - column: name
>>>     mode: NULLABLE
>>>     type: STRING
>>>   - column: type
>>>     mode: NULLABLE
>>>     type: STRING
>>> sourceSystemTimestamps:
>>>   createTime: '2019-08-16T01:49:06.235Z'
>>>   updateTime: '2020-02-04T17:18:17.671Z'
>>> type: FILESET
>>> ```
>>>
>> --
> Regards,
> Tomo
>

Reply via email to