[
https://issues.apache.org/jira/browse/AVRO-3380?focusedWorklogId=724969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-724969
]
ASF GitHub Bot logged work on AVRO-3380:
----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Feb/22 07:32
Start Date: 11/Feb/22 07:32
Worklog Time Spent: 10m
Work Description: jjaakola-aiven commented on a change in pull request
#1529:
URL: https://github.com/apache/avro/pull/1529#discussion_r804408932
##########
File path: lang/py/avro/io.py
##########
@@ -663,63 +666,69 @@ def read_data(self, writers_schema: avro.schema.Schema,
readers_schema: avro.sch
# This shouldn't happen because of the match check at the start of
this method.
raise avro.errors.SchemaResolutionException("Schemas do not
match.", writers_schema, readers_schema)
- if writers_schema.type == "null":
- return None
- if writers_schema.type == "boolean":
- return decoder.read_boolean()
- if writers_schema.type == "string":
- return decoder.read_utf8()
- if writers_schema.type == "int":
- if logical_type == avro.constants.DATE:
- return decoder.read_date_from_int()
- if logical_type == avro.constants.TIME_MILLIS:
- return decoder.read_time_millis_from_int()
- return decoder.read_int()
- if writers_schema.type == "long":
- if logical_type == avro.constants.TIME_MICROS:
- return decoder.read_time_micros_from_long()
- if logical_type == avro.constants.TIMESTAMP_MILLIS:
- return decoder.read_timestamp_millis_from_long()
- if logical_type == avro.constants.TIMESTAMP_MICROS:
- return decoder.read_timestamp_micros_from_long()
- return decoder.read_long()
- if writers_schema.type == "float":
- return decoder.read_float()
- if writers_schema.type == "double":
- return decoder.read_double()
- if writers_schema.type == "bytes":
- if logical_type == "decimal":
- precision = writers_schema.get_prop("precision")
- if not (isinstance(precision, int) and precision > 0):
- warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal precision {precision}. Must be a positive integer."))
- return decoder.read_bytes()
- scale = writers_schema.get_prop("scale")
- if not (isinstance(scale, int) and scale > 0):
- warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal scale {scale}. Must be a positive integer."))
- return decoder.read_bytes()
- return decoder.read_decimal_from_bytes(precision, scale)
- return decoder.read_bytes()
- if isinstance(writers_schema, avro.schema.FixedSchema) and
isinstance(readers_schema, avro.schema.FixedSchema):
- if logical_type == "decimal":
- precision = writers_schema.get_prop("precision")
- if not (isinstance(precision, int) and precision > 0):
- warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal precision {precision}. Must be a positive integer."))
- return self.read_fixed(writers_schema, readers_schema,
decoder)
- scale = writers_schema.get_prop("scale")
- if not (isinstance(scale, int) and scale > 0):
- warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal scale {scale}. Must be a positive integer."))
- return self.read_fixed(writers_schema, readers_schema,
decoder)
- return decoder.read_decimal_from_fixed(precision, scale,
writers_schema.size)
- return self.read_fixed(writers_schema, readers_schema, decoder)
- if isinstance(writers_schema, avro.schema.EnumSchema) and
isinstance(readers_schema, avro.schema.EnumSchema):
- return self.read_enum(writers_schema, readers_schema, decoder)
- if isinstance(writers_schema, avro.schema.ArraySchema) and
isinstance(readers_schema, avro.schema.ArraySchema):
- return self.read_array(writers_schema, readers_schema, decoder)
- if isinstance(writers_schema, avro.schema.MapSchema) and
isinstance(readers_schema, avro.schema.MapSchema):
- return self.read_map(writers_schema, readers_schema, decoder)
- if isinstance(writers_schema, avro.schema.RecordSchema) and
isinstance(readers_schema, avro.schema.RecordSchema):
- # .type in ["record", "error", "request"]:
- return self.read_record(writers_schema, readers_schema, decoder)
+ try:
+ if writers_schema.type == "null":
+ return None
+ if writers_schema.type == "boolean":
+ return decoder.read_boolean()
+ if writers_schema.type == "string":
+ return decoder.read_utf8()
+ if writers_schema.type == "int":
+ if logical_type == avro.constants.DATE:
+ return decoder.read_date_from_int()
+ if logical_type == avro.constants.TIME_MILLIS:
+ return decoder.read_time_millis_from_int()
+ return decoder.read_int()
+ if writers_schema.type == "long":
+ if logical_type == avro.constants.TIME_MICROS:
+ return decoder.read_time_micros_from_long()
+ if logical_type == avro.constants.TIMESTAMP_MILLIS:
+ return decoder.read_timestamp_millis_from_long()
+ if logical_type == avro.constants.TIMESTAMP_MICROS:
+ return decoder.read_timestamp_micros_from_long()
+ return decoder.read_long()
+ if writers_schema.type == "float":
+ return decoder.read_float()
+ if writers_schema.type == "double":
+ return decoder.read_double()
+ if writers_schema.type == "bytes":
+ if logical_type == "decimal":
+ precision = writers_schema.get_prop("precision")
+ if not (isinstance(precision, int) and precision > 0):
+ warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal precision {precision}. Must be a positive integer."))
+ return decoder.read_bytes()
+ scale = writers_schema.get_prop("scale")
+ if not (isinstance(scale, int) and scale > 0):
+ warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal scale {scale}. Must be a positive integer."))
+ return decoder.read_bytes()
+ return decoder.read_decimal_from_bytes(precision, scale)
+ return decoder.read_bytes()
+ if isinstance(writers_schema, avro.schema.FixedSchema) and
isinstance(readers_schema, avro.schema.FixedSchema):
+ if logical_type == "decimal":
+ precision = writers_schema.get_prop("precision")
+ if not (isinstance(precision, int) and precision > 0):
+ warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal precision {precision}. Must be a positive integer."))
+ return self.read_fixed(writers_schema, readers_schema,
decoder)
+ scale = writers_schema.get_prop("scale")
+ if not (isinstance(scale, int) and scale > 0):
+ warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal scale {scale}. Must be a positive integer."))
+ return self.read_fixed(writers_schema, readers_schema,
decoder)
+ return decoder.read_decimal_from_fixed(precision, scale,
writers_schema.size)
+ return self.read_fixed(writers_schema, readers_schema, decoder)
+ if isinstance(writers_schema, avro.schema.EnumSchema) and
isinstance(readers_schema, avro.schema.EnumSchema):
+ return self.read_enum(writers_schema, readers_schema, decoder)
+ if isinstance(writers_schema, avro.schema.ArraySchema) and
isinstance(readers_schema, avro.schema.ArraySchema):
+ return self.read_array(writers_schema, readers_schema, decoder)
+ if isinstance(writers_schema, avro.schema.MapSchema) and
isinstance(readers_schema, avro.schema.MapSchema):
+ return self.read_map(writers_schema, readers_schema, decoder)
+ if isinstance(writers_schema, avro.schema.RecordSchema) and
isinstance(readers_schema, avro.schema.RecordSchema):
+ # .type in ["record", "error", "request"]:
+ return self.read_record(writers_schema, readers_schema,
decoder)
+ except avro.errors.InvalidBytesRead as e:
+ decoder.reader.seek(0)
Review comment:
The proposed idea here is to map the specific read error to something
that helps out when encoding is off. Logging the data and schema used would
help finding the issue.
But I'll remove this. I think logging the data is wrong, as there are
environments where data in the datum is sensitive.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 724969)
Time Spent: 1h 40m (was: 1.5h)
> Byte reading in avro.io does not assert read bytes to requested nbytes
> ----------------------------------------------------------------------
>
> Key: AVRO-3380
> URL: https://issues.apache.org/jira/browse/AVRO-3380
> Project: Apache Avro
> Issue Type: Bug
> Components: python
> Affects Versions: 1.11.0
> Reporter: Jarkko Jaakola
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> The Python 3 compatibility layer in version 1.10.0 asserted the number of
> read bytes to match the requested number.
> In version 1.11.0 the read returns what is available and just progresses.
> This can be problem when having incompatible schemas or some other unexpected
> condition.
> 1.10.0 implementation:
> [https://github.com/apache/avro/blob/release-1.10.0/lang/py3/avro/io.py#L158]
> 1.11.0 implementation:
> [https://github.com/apache/avro/blob/443614c12a15bb58fcf2487eb67ca6f885a68f96/lang/py/avro/io.py#L225]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)