[
https://issues.apache.org/jira/browse/AVRO-3380?focusedWorklogId=724671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-724671
]
ASF GitHub Bot logged work on AVRO-3380:
----------------------------------------
Author: ASF GitHub Bot
Created on: 10/Feb/22 18:20
Start Date: 10/Feb/22 18:20
Worklog Time Spent: 10m
Work Description: RyanSkraba commented on a change in pull request #1529:
URL: https://github.com/apache/avro/pull/1529#discussion_r803974606
##########
File path: lang/py/avro/io.py
##########
@@ -222,7 +222,10 @@ def read(self, n: int) -> bytes:
"""
Read n bytes.
"""
- return self.reader.read(n)
+ read_bytes = self.reader.read(n)
Review comment:
You mentioned that the original `avro-python3` code also checked that n
is non-negative. I think that's a good idea too!
##########
File path: lang/py/avro/io.py
##########
@@ -663,63 +666,69 @@ def read_data(self, writers_schema: avro.schema.Schema,
readers_schema: avro.sch
# This shouldn't happen because of the match check at the start of
this method.
raise avro.errors.SchemaResolutionException("Schemas do not
match.", writers_schema, readers_schema)
- if writers_schema.type == "null":
- return None
- if writers_schema.type == "boolean":
- return decoder.read_boolean()
- if writers_schema.type == "string":
- return decoder.read_utf8()
- if writers_schema.type == "int":
- if logical_type == avro.constants.DATE:
- return decoder.read_date_from_int()
- if logical_type == avro.constants.TIME_MILLIS:
- return decoder.read_time_millis_from_int()
- return decoder.read_int()
- if writers_schema.type == "long":
- if logical_type == avro.constants.TIME_MICROS:
- return decoder.read_time_micros_from_long()
- if logical_type == avro.constants.TIMESTAMP_MILLIS:
- return decoder.read_timestamp_millis_from_long()
- if logical_type == avro.constants.TIMESTAMP_MICROS:
- return decoder.read_timestamp_micros_from_long()
- return decoder.read_long()
- if writers_schema.type == "float":
- return decoder.read_float()
- if writers_schema.type == "double":
- return decoder.read_double()
- if writers_schema.type == "bytes":
- if logical_type == "decimal":
- precision = writers_schema.get_prop("precision")
- if not (isinstance(precision, int) and precision > 0):
- warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal precision {precision}. Must be a positive integer."))
- return decoder.read_bytes()
- scale = writers_schema.get_prop("scale")
- if not (isinstance(scale, int) and scale > 0):
- warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal scale {scale}. Must be a positive integer."))
- return decoder.read_bytes()
- return decoder.read_decimal_from_bytes(precision, scale)
- return decoder.read_bytes()
- if isinstance(writers_schema, avro.schema.FixedSchema) and
isinstance(readers_schema, avro.schema.FixedSchema):
- if logical_type == "decimal":
- precision = writers_schema.get_prop("precision")
- if not (isinstance(precision, int) and precision > 0):
- warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal precision {precision}. Must be a positive integer."))
- return self.read_fixed(writers_schema, readers_schema,
decoder)
- scale = writers_schema.get_prop("scale")
- if not (isinstance(scale, int) and scale > 0):
- warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal scale {scale}. Must be a positive integer."))
- return self.read_fixed(writers_schema, readers_schema,
decoder)
- return decoder.read_decimal_from_fixed(precision, scale,
writers_schema.size)
- return self.read_fixed(writers_schema, readers_schema, decoder)
- if isinstance(writers_schema, avro.schema.EnumSchema) and
isinstance(readers_schema, avro.schema.EnumSchema):
- return self.read_enum(writers_schema, readers_schema, decoder)
- if isinstance(writers_schema, avro.schema.ArraySchema) and
isinstance(readers_schema, avro.schema.ArraySchema):
- return self.read_array(writers_schema, readers_schema, decoder)
- if isinstance(writers_schema, avro.schema.MapSchema) and
isinstance(readers_schema, avro.schema.MapSchema):
- return self.read_map(writers_schema, readers_schema, decoder)
- if isinstance(writers_schema, avro.schema.RecordSchema) and
isinstance(readers_schema, avro.schema.RecordSchema):
- # .type in ["record", "error", "request"]:
- return self.read_record(writers_schema, readers_schema, decoder)
+ try:
+ if writers_schema.type == "null":
+ return None
+ if writers_schema.type == "boolean":
+ return decoder.read_boolean()
+ if writers_schema.type == "string":
+ return decoder.read_utf8()
+ if writers_schema.type == "int":
+ if logical_type == avro.constants.DATE:
+ return decoder.read_date_from_int()
+ if logical_type == avro.constants.TIME_MILLIS:
+ return decoder.read_time_millis_from_int()
+ return decoder.read_int()
+ if writers_schema.type == "long":
+ if logical_type == avro.constants.TIME_MICROS:
+ return decoder.read_time_micros_from_long()
+ if logical_type == avro.constants.TIMESTAMP_MILLIS:
+ return decoder.read_timestamp_millis_from_long()
+ if logical_type == avro.constants.TIMESTAMP_MICROS:
+ return decoder.read_timestamp_micros_from_long()
+ return decoder.read_long()
+ if writers_schema.type == "float":
+ return decoder.read_float()
+ if writers_schema.type == "double":
+ return decoder.read_double()
+ if writers_schema.type == "bytes":
+ if logical_type == "decimal":
+ precision = writers_schema.get_prop("precision")
+ if not (isinstance(precision, int) and precision > 0):
+ warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal precision {precision}. Must be a positive integer."))
+ return decoder.read_bytes()
+ scale = writers_schema.get_prop("scale")
+ if not (isinstance(scale, int) and scale > 0):
+ warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal scale {scale}. Must be a positive integer."))
+ return decoder.read_bytes()
+ return decoder.read_decimal_from_bytes(precision, scale)
+ return decoder.read_bytes()
+ if isinstance(writers_schema, avro.schema.FixedSchema) and
isinstance(readers_schema, avro.schema.FixedSchema):
+ if logical_type == "decimal":
+ precision = writers_schema.get_prop("precision")
+ if not (isinstance(precision, int) and precision > 0):
+ warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal precision {precision}. Must be a positive integer."))
+ return self.read_fixed(writers_schema, readers_schema,
decoder)
+ scale = writers_schema.get_prop("scale")
+ if not (isinstance(scale, int) and scale > 0):
+ warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid
decimal scale {scale}. Must be a positive integer."))
+ return self.read_fixed(writers_schema, readers_schema,
decoder)
+ return decoder.read_decimal_from_fixed(precision, scale,
writers_schema.size)
+ return self.read_fixed(writers_schema, readers_schema, decoder)
+ if isinstance(writers_schema, avro.schema.EnumSchema) and
isinstance(readers_schema, avro.schema.EnumSchema):
+ return self.read_enum(writers_schema, readers_schema, decoder)
+ if isinstance(writers_schema, avro.schema.ArraySchema) and
isinstance(readers_schema, avro.schema.ArraySchema):
+ return self.read_array(writers_schema, readers_schema, decoder)
+ if isinstance(writers_schema, avro.schema.MapSchema) and
isinstance(readers_schema, avro.schema.MapSchema):
+ return self.read_map(writers_schema, readers_schema, decoder)
+ if isinstance(writers_schema, avro.schema.RecordSchema) and
isinstance(readers_schema, avro.schema.RecordSchema):
+ # .type in ["record", "error", "request"]:
+ return self.read_record(writers_schema, readers_schema,
decoder)
+ except avro.errors.InvalidBytesRead as e:
+ decoder.reader.seek(0)
Review comment:
I'm not sure this is the right thing to do here --> I like the original
message in `InvalidBytesRead` more than the one constructed by
AvroTypeException.
Is there something nice that we can do in the avro.error hierarchy to make
this useful?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 724671)
Time Spent: 1.5h (was: 1h 20m)
> Byte reading in avro.io does not assert read bytes to requested nbytes
> ----------------------------------------------------------------------
>
> Key: AVRO-3380
> URL: https://issues.apache.org/jira/browse/AVRO-3380
> Project: Apache Avro
> Issue Type: Bug
> Components: python
> Affects Versions: 1.11.0
> Reporter: Jarkko Jaakola
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> The Python 3 compatibility layer in version 1.10.0 asserted the number of
> read bytes to match the requested number.
> In version 1.11.0 the read returns what is available and just progresses.
> This can be problem when having incompatible schemas or some other unexpected
> condition.
> 1.10.0 implementation:
> [https://github.com/apache/avro/blob/release-1.10.0/lang/py3/avro/io.py#L158]
> 1.11.0 implementation:
> [https://github.com/apache/avro/blob/443614c12a15bb58fcf2487eb67ca6f885a68f96/lang/py/avro/io.py#L225]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)