[jira] [Updated] (BEAM-9502) SchemaCoder assigns random UUID, causes Dataflow's compatibility check to fail

Yaron Neuman (Jira) Fri, 13 Mar 2020 12:20:52 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yaron Neuman updated BEAM-9502:
-------------------------------
    Description: 
After fe4b7794, _Schema.equals_ comparing only the UUIDs for faster comparison.
 After 0b3b18c6 _SchemaCoder_ forcing random UUID when schema.uuid is null.

thus, when trying to update (--update) a Dataflow job with row schemas in 
user-code, in the second run (the update) the pipelines compatibility check 
fails because SchemaCoder produce another random UUID.

 

The user can set the UUID after creating the Schema, but not with Schema.Builder
 and I'm afraid most users, that are not aware to the internal implementation, 
won't do that.

 

In my branch, I added _.withUUID_ and _.withRandomUUID_ to _Schema.Builder_

But I think a better solution will be to calculate the UUID based on the schema 
itself.

any thoughts?

[~reuvenlax]

 

  was:
After fe4b7794, _Schema.equals_ comparing only the UUIDs for faster comparison.
 After 0b3b18c6 _SchemaCoder_ forcing random UUID when schema.uuid is null.

thus, when trying to update a Dataflow job with row schemas in user-code, the 
second run (the update) the pipelines compatibility check fails because 
SchemaCoder produce another random UUID.

 

The user can set the UUID after creating the Schema, but not with Schema.Builder
 and I'm afraid most users, that are not aware to the internal implementation, 
won't do that.

 

In my branch, I added _.withUUID_ and _.withRandomUUID_ to _Schema.Builder_

But I think a better solution will be to calculate the UUID based on the schema 
itself.

any thoughts?

[~reuvenlax]

 


> SchemaCoder assigns random UUID, causes Dataflow's compatibility check to fail
> ------------------------------------------------------------------------------
>
>                 Key: BEAM-9502
>                 URL: https://issues.apache.org/jira/browse/BEAM-9502
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, sdk-java-core
>            Reporter: Yaron Neuman
>            Priority: Minor
>
> After fe4b7794, _Schema.equals_ comparing only the UUIDs for faster 
> comparison.
>  After 0b3b18c6 _SchemaCoder_ forcing random UUID when schema.uuid is null.
> thus, when trying to update (--update) a Dataflow job with row schemas in 
> user-code, in the second run (the update) the pipelines compatibility check 
> fails because SchemaCoder produce another random UUID.
>  
> The user can set the UUID after creating the Schema, but not with 
> Schema.Builder
>  and I'm afraid most users, that are not aware to the internal 
> implementation, won't do that.
>  
> In my branch, I added _.withUUID_ and _.withRandomUUID_ to _Schema.Builder_
> But I think a better solution will be to calculate the UUID based on the 
> schema itself.
> any thoughts?
> [~reuvenlax]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9502) SchemaCoder assigns random UUID, causes Dataflow's compatibility check to fail

Reply via email to