[ 
https://issues.apache.org/jira/browse/AVRO-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859898#comment-17859898
 ] 

Martin Tzvetanov Grigorov commented on AVRO-4004:
-------------------------------------------------

There might be a bug in the Python and Java impls but after making the change 
in the Rust code I get a different fingerprint because of the namespace for 
field "b".

>From the spec:
{code:java}
[FULLNAMES] Replace short names with fullnames, using applicable namespaces to 
do so. Then eliminate namespace attributes, which are now redundant. {code}
 
{code:java}
running 1 test
Canonical form: 
"{\"name\":\"test\",\"type\":\"record\",\"fields\":[{\"name\":\"a\",\"type\":\"long\"},{\"name\":\"test.a.b\",\"type\":\"string\"},{\"name\":\"c\",\"type\":{\"type\":\"long\"}}]}"
 {code}
Note \"name\":\"{*}test.a.{*}b\"

 

PR: [https://github.com/apache/avro/pull/2976]

But I am not sure what to do with "test_equivalence_after_round_trip" unit 
test. There is no way to reconstruct the schema after a round trip now ...

> [Rust] Canonical form transformation does not strip the logicalType 
> --------------------------------------------------------------------
>
>                 Key: AVRO-4004
>                 URL: https://issues.apache.org/jira/browse/AVRO-4004
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: rust
>            Reporter: Dominik Mautz
>            Priority: Major
>
> The Rust implementation of for the canonical transformation does not strip 
> the _logicalType_ as required by the [STRIP] rule 
> ([https://avro.apache.org/docs/1.11.0/spec.html#Transforming+into+Parsing+Canonical+Form]).
>  This results in different fingerprints for the same schema compared to other 
> implementations (at least for Python and Java)
> This is for instance can become an issue for the kafka-delta-ingest 
> ([https://github.com/delta-io/kafka-delta-ingest]).
> Rust
> {code:java}
> [package]
> name = "avro issue"
> version = "0.2.0"
> edition = "2018"
> [dependencies]
> apache-avro = "0.16.0"
> anyhow = "1.0.86"
> {code}
> {code:java}
> use anyhow::Result;
> use apache_avro::{rabin::Rabin, Schema};
> use sha2::Sha256;
> fn main() -> Result<()> {
>     let schema_str = r#"
>       {
>         "type": "record",
>         "name": "test",
>         "fields": [
>             {"name": "a", "type": "long", "default": 42, "doc": "The field 
> a"},
>             {"name": "b", "type": "string", "namespace": "test.a"},
>             {"name": "c", "type": "long", "logicalType": "timestamp-micros"}
>         ]
>     }"#;
>     let schema =  Schema::parse_str(schema_str)?;
>     let canonical_form = schema.canonical_form();
>     let fp_rabin = schema.fingerprint::<Rabin>();
>     println!("Canonical form: {}", canonical_form);
>     println!("Rabin fingerprint: {}", fp_rabin);
>     Ok(())
> }
> {code}
> Output:
> {code:java}
> Canonical form: 
> {"name":"test","type":"record","fields":[{"name":"a","type":"long"},{"name":"b","type":"string"},{"name":"c","type":{"type":"long","logicalType":"timestamp-micros"}}]}
> Rabin fingerprint: 28cf0a67d9937bb3
> {code}
> As you can see, the _logicalType_ is still present in the "canonical form."
> Python
> {code:python}
>  
> import avro.schema
> schema_str = """
>     {
>         "type": "record",
>         "name": "test",
>         "fields": [
>             {"name": "a", "type": "long", "default": 42, "doc": "The field 
> a"},
>             {"name": "b", "type": "string", "namespace": "test.a"},
>             {"name": "c", "type": "long", "logicalType": "timestamp-micros"}
>         ]
>     }"""
> schema = avro.schema.parse(schema_str)
> print(f"Canonical form: {schema.canonical_form}")
> print(f"Rabin fingerprint: {schema.fingerprint().hex()}")
> {code}
> Output:
> {code:java}
> Canonical form: 
> {"name":"test","type":"record","fields":[{"name":"a","type":"long"},{"name":"b","type":"string"},{"name":"c","type":"long"}]}
> Rabin fingerprint: 385501e341b00a1c
> {code}
> Java returns the same output as python.
> Imho, I think that changing the line
> [https://github.com/apache/avro/blob/main/lang/rust/avro/src/schema.rs#L2159]
> to
> {code:java}
> //...
>  if field_ordering_position(k).is_none() || k == "default" || k == "doc" || k 
> == "aliases"  || k == "logicalType" {
> //...
>  {code}
> should resolve the issue. However, I am unsure if this line should actually 
> include more even attributes (other than the currently explicitly stated).
> Nevertheless, the test in 
> [https://github.com/apache/avro/blob/fdab5db0816e28e3e10c87910c8b6f98c33072dc/lang/rust/avro/src/schema.rs#L3388]
> must also be adopted to reflect the correct transformation of the canonical 
> form and the corresponding fingerprint.
> Rabin: 385501e341b00a1c
> MD5: 384f46367ef8c22dbbf44109b82ff7aa
> SHA-256: 8e72f58f2d84a59d6a08e8db5fdc6484dee35babf33179cea72889ae63083f36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to