[
https://issues.apache.org/jira/browse/BEAM-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaron Neuman updated BEAM-9502:
-------------------------------
Description:
After fe4b7794, _Schema.equals_ comparing only the UUIDs for faster comparison.
After 0b3b18c6 _SchemaCoder_ forcing random UUID when schema.uuid is null.
thus, when trying to update (--update) a Dataflow job with row schemas in
user-code, in the second run (the update) the pipelines compatibility check
fails because SchemaCoder produce another random UUID.
The user can set the UUID after creating the Schema, but not with Schema.Builder
and I'm afraid most users, that are not aware to the internal implementation,
won't do that.
In my branch, I added _.withUUID_ and _.withRandomUUID_ to _Schema.Builder_
But I think a better solution will be to calculate the UUID based on the schema
itself.
any thoughts?
[~reuvenlax]
was:
After fe4b7794, _Schema.equals_ comparing only the UUIDs for faster comparison.
After 0b3b18c6 _SchemaCoder_ forcing random UUID when schema.uuid is null.
thus, when trying to update a Dataflow job with row schemas in user-code, the
second run (the update) the pipelines compatibility check fails because
SchemaCoder produce another random UUID.
The user can set the UUID after creating the Schema, but not with Schema.Builder
and I'm afraid most users, that are not aware to the internal implementation,
won't do that.
In my branch, I added _.withUUID_ and _.withRandomUUID_ to _Schema.Builder_
But I think a better solution will be to calculate the UUID based on the schema
itself.
any thoughts?
[~reuvenlax]
> SchemaCoder assigns random UUID, causes Dataflow's compatibility check to fail
> ------------------------------------------------------------------------------
>
> Key: BEAM-9502
> URL: https://issues.apache.org/jira/browse/BEAM-9502
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow, sdk-java-core
> Reporter: Yaron Neuman
> Priority: Minor
>
> After fe4b7794, _Schema.equals_ comparing only the UUIDs for faster
> comparison.
> After 0b3b18c6 _SchemaCoder_ forcing random UUID when schema.uuid is null.
> thus, when trying to update (--update) a Dataflow job with row schemas in
> user-code, in the second run (the update) the pipelines compatibility check
> fails because SchemaCoder produce another random UUID.
>
> The user can set the UUID after creating the Schema, but not with
> Schema.Builder
> and I'm afraid most users, that are not aware to the internal
> implementation, won't do that.
>
> In my branch, I added _.withUUID_ and _.withRandomUUID_ to _Schema.Builder_
> But I think a better solution will be to calculate the UUID based on the
> schema itself.
> any thoughts?
> [~reuvenlax]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)