Hi,

Could not reach the users mailing list so testing the dev mailing list.
Can't find contact email for Jira account creation. Can't join Slack
because I have no ASF email account.

Issue described below.

Thanks,
Matthew

---------- Forwarded message ---------
From: Matthew Chng <[email protected]>
Date: Wed, Dec 7, 2022 at 4:18 PM
Subject: Possible bug with avro-js with binary serdes and schema resolution
To: <[email protected]>


Hi all,
I am encountering an issue with the avro.js' NPM module where Avro
serialized into binary buffers are not readable by a different but
compatible reader schema (evolved). This issue is only occurring when using
the `toBuffer()` and `fromBuffer()` methods and works as expected when
using the `toString()` and `fromString()` JSON serdes methods.
The following is an example of an evolving schema with the difference being
the additional `gender` field that has a default value.

const parentV1Type = avro.parse({
name: 'Parent',
type: 'record',
fields: [
{ name: 'name', type: 'string' }
]
})

const parentV2Type = avro.parse({
name: 'Parent',
type: 'record',
fields: [
{ name: 'name', type: 'string' },
{ name: 'gender', type: 'string', default: 'unspecified' }
]
})

According to
https://avro.apache.org/docs/1.11.1/specification/#schema-resolution
they should be both backwards/forwards reader compatible.
They have these properties.

   - both schemas are records with the same (unqualified) name
   - if the writer’s record contains a field with a name not present in the
   reader’s record, the writer’s value for that field is ignored.
   - if the reader’s record schema has a field that contains a default
   value, and writer’s schema does not have a field with the same name, then
   the reader should use the default value from its field.

Testing with the `toString()` and `fromString()` JSON serdes methods
indicated as such. I've created a simple test script to produce the issue.
Also included test with nested schema. The script is included after the
output. The errors encountered are either

   - truncated buffer; or
   - trailing data

Script output:


--- JSON Writer: ParentV1, Reader: ParentV2 ---
parentV1Json:            {"name":"David"}
parentV2ReadFromV1Json:  {"name":"David","gender":"unspecified"}

--- JSON Writer: ParentV2, Reader: ParentV1 ---
parentV2Json:            {"name":"David","gender":"Father"}
parentV1ReadFromV2Json:  {"name":"David"}

--- Buffer Writer: ParentV1, Reader: ParentV1 ---
parentV1Buffer:            <Buffer 0a 44 61 76 69 64>
parentV1ReadFromV1Buffer:  {"name":"David"}

--- Buffer Writer: ParentV2, Reader: ParentV2 ---
parentV2Buffer:            <Buffer 0a 44 61 76 69 64 0c 46 61 74 68 65 72>
parentV2ReadFromV1Buffer:  {"name":"David","gender":"Father"}

--- Buffer Writer: ParentV1, Reader: ParentV2 ---
parentV1Buffer:           <Buffer 0a 44 61 76 69 64>
parentV2ReadFromV1Buffer: ERROR  truncated buffer

--- Buffer Writer: ParentV2, Reader: ParentV1 ---
parentV2Buffer:           <Buffer 0a 44 61 76 69 64 0c 46 61 74 68 65 72>
parentV1ReadFromV2Buffer: ERROR  trailing data

--- JSON Writer: meWithParentV1, Reader: meWithParentV2 ---
meWithParentV1Json:            {"name":"Davidson","parent":{"name":"David"}}
meWithParentV2ReadFromV1Json:
 {"name":"Davidson","parent":{"name":"David","gender":"unspecified"}}

--- JSON Writer: meWithParentV2, Reader: meWithParentV1 ---
meWithParentV2Json:
 {"name":"Davidson","parent":{"name":"David","gender":"Father"}}
meWithParentV1ReadFromV2Json:  {"name":"Davidson","parent":{"name":"David"}}

--- Buffer Writer: meWithParentV1, Reader: meWithParentV1 ---
meWithParentV1Buffer:            <Buffer 10 44 61 76 69 64 73 6f 6e 0a 44
61 76 69 64>
meWithParentV1ReadFromV1Buffer:
 {"name":"Davidson","parent":{"name":"David"}}

--- Buffer Writer: meWithParentV2, Reader: meWithParentV2 ---
meWithParentV2Buffer:            <Buffer 10 44 61 76 69 64 73 6f 6e 0a 44
61 76 69 64 0c 46 61 74 68 65 72>
meWithParentV2ReadFromV2Buffer:
 {"name":"Davidson","parent":{"name":"David","gender":"Father"}}

--- Buffer Writer: meWithParentV1, Reader: meWithParentV2 ---
meWithParentV1Buffer:           <Buffer 10 44 61 76 69 64 73 6f 6e 0a 44 61
76 69 64>
meWithParentV2ReadFromV1Buffer: ERROR  truncated buffer

--- Buffer Writer: meWithParentV2, Reader: meWithParentV1 ---
meWithParentV2Buffer:           <Buffer 10 44 61 76 69 64 73 6f 6e 0a 44 61
76 69 64 0c 46 61 74 68 65 72>
meWithParentV1ReadFromV2Buffer: ERROR  trailing data


Script src:

const avro = require('avro-js')
const { writer } = require('repl')

const parentV1Type = avro.parse({
name: 'Parent',
type: 'record',
fields: [
{ name: 'name', type: 'string' }
]
})

const parentV2Type = avro.parse({
name: 'Parent',
type: 'record',
fields: [
{ name: 'name', type: 'string' },
{ name: 'gender', type: 'string', default: 'unspecified' }
]
})

const meSchema = {
name: 'Me',
type: 'record',
fields: [
{ name: 'name', type: 'string' },
{ name: 'parent', type: 'Parent' }
]
}

const meWithParentV2Type = avro.parse(meSchema, {
registry: {
Parent: parentV2Type
}
})

const meWithParentV1Type = avro.parse(meSchema, {
registry: {
Parent: parentV1Type
}
})

const parentV1 = { name: 'David'}
const parentV2 = { name: 'David', gender: 'Father' }

const meWithParentV1 = {
name: 'Davidson',
parent: parentV1
}

const meWithParentV2 = {
name: 'Davidson',
parent: parentV2
}

console.log("")
console.log("--- JSON Writer: ParentV1, Reader: ParentV2 ---")
const parentV1Json = parentV1Type.toString(parentV1)
console.log("parentV1Json: ", parentV1Json)
const parentV2ReadFromV1Json = parentV2Type.fromString(parentV1Json)
console.log("parentV2ReadFromV1Json: ", JSON.stringify(
parentV2ReadFromV1Json))

console.log("")
console.log("--- JSON Writer: ParentV2, Reader: ParentV1 ---")
const parentV2Json = parentV2Type.toString(parentV2)
console.log("parentV2Json: ", parentV2Json)
const parentV1ReadFromV2Json = parentV1Type.fromString(parentV2Json)
console.log("parentV1ReadFromV2Json: ", JSON.stringify(
parentV1ReadFromV2Json))

console.log("")
console.log("--- Buffer Writer: ParentV1, Reader: ParentV1 ---")
const parentV1Buffer = parentV1Type.toBuffer(parentV1)
console.log("parentV1Buffer: ", parentV1Buffer)
const parentV1ReadFromV1Buffer = parentV1Type.fromBuffer(parentV1Buffer)
console.log("parentV1ReadFromV1Buffer: ", JSON.stringify(
parentV1ReadFromV1Buffer))

console.log("")
console.log("--- Buffer Writer: ParentV2, Reader: ParentV2 ---")
const parentV2Buffer = parentV2Type.toBuffer(parentV2)
console.log("parentV2Buffer: ", parentV2Buffer)
const parentV2ReadFromV2Buffer = parentV2Type.fromBuffer(parentV2Buffer)
console.log("parentV2ReadFromV1Buffer: ", JSON.stringify(
parentV2ReadFromV2Buffer))

console.log("")
console.log("--- Buffer Writer: ParentV1, Reader: ParentV2 ---")
console.log("parentV1Buffer: ", parentV1Buffer)
try {
const parentV2ReadFromV1Buffer = parentV2Type.fromBuffer(parentV1Buffer)
console.log("parentV2ReadFromV1Buffer: ", JSON.stringify(
parentV2ReadFromV1Buffer))
} catch (e) {
console.log("parentV2ReadFromV1Buffer: ERROR ", e.message)
}

console.log("")
console.log("--- Buffer Writer: ParentV2, Reader: ParentV1 ---")
console.log("parentV2Buffer: ", parentV2Buffer)
try {
const parentV1ReadFromV2Buffer = parentV1Type.fromBuffer(parentV2Buffer)
console.log("parentV1ReadFromV2Buffer: ", JSON.stringify(
parentV1ReadFromV2Buffer))
} catch (e) {
console.log("parentV1ReadFromV2Buffer: ERROR ", e.message)
}

console.log("")
console.log("--- JSON Writer: meWithParentV1, Reader: meWithParentV2 ---")
const meWithParentV1Json = meWithParentV1Type.toString(meWithParentV1)
console.log("meWithParentV1Json: ", meWithParentV1Json)
const meWithParentV2ReadFromV1Json = meWithParentV2Type.fromString(
meWithParentV1Json)
console.log("meWithParentV2ReadFromV1Json: ", JSON.stringify(
meWithParentV2ReadFromV1Json))

console.log("")
console.log("--- JSON Writer: meWithParentV2, Reader: meWithParentV1 ---")
const meWithParentV2Json = meWithParentV2Type.toString(meWithParentV2)
console.log("meWithParentV2Json: ", meWithParentV2Json)
const meWithParentV1ReadFromV2Json = meWithParentV1Type.fromString(
meWithParentV2Json)
console.log("meWithParentV1ReadFromV2Json: ", JSON.stringify(
meWithParentV1ReadFromV2Json))

console.log("")
console.log("--- Buffer Writer: meWithParentV1, Reader: meWithParentV1 ---")
const meWithParentV1Buffer = meWithParentV1Type.toBuffer(meWithParentV1)
console.log("meWithParentV1Buffer: ", meWithParentV1Buffer)
const meWithParentV1ReadFromV1Buffer = meWithParentV1Type.fromBuffer(
meWithParentV1Buffer)
console.log("meWithParentV1ReadFromV1Buffer: ", JSON.stringify(
meWithParentV1ReadFromV1Buffer))

console.log("")
console.log("--- Buffer Writer: meWithParentV2, Reader: meWithParentV2 ---")
const meWithParentV2Buffer = meWithParentV2Type.toBuffer(meWithParentV2)
console.log("meWithParentV2Buffer: ", meWithParentV2Buffer)
const meWithParentV2ReadFromV2Buffer = meWithParentV2Type.fromBuffer(
meWithParentV2Buffer)
console.log("meWithParentV2ReadFromV2Buffer: ", JSON.stringify(
meWithParentV2ReadFromV2Buffer))

console.log("")
console.log("--- Buffer Writer: meWithParentV1, Reader: meWithParentV2 ---")
console.log("meWithParentV1Buffer: ", meWithParentV1Buffer)
try {
const meWithParentV2ReadFromV1Buffer = meWithParentV2Type.fromBuffer(
meWithParentV1Buffer)
console.log("meWithParentV2ReadFromV1Buffer: ", JSON.stringify(
meWithParentV2ReadFromV1Buffer))
} catch (e) {
console.log("meWithParentV2ReadFromV1Buffer: ERROR ", e.message)
}

console.log("")
console.log("--- Buffer Writer: meWithParentV2, Reader: meWithParentV1 ---")
console.log("meWithParentV2Buffer: ", meWithParentV2Buffer)
try {
const meWithParentV1ReadFromV2Buffer = meWithParentV1Type.fromBuffer(
meWithParentV2Buffer)
console.log("meWithParentV1ReadFromV2Buffer: ", JSON.stringify(
meWithParentV1ReadFromV2Buffer))
} catch (e) {
console.log("meWithParentV1ReadFromV2Buffer: ERROR ", e.message)
}


If this is indeed a bug, how do I create a ticket in the Jira board to
report it? Thanks.

Matthew

Reply via email to