[jira] [Work logged] (AVRO-3380) Byte reading in avro.io does not assert read bytes to requested nbytes

ASF GitHub Bot (Jira) Thu, 10 Feb 2022 23:33:06 -0800


     [ 
https://issues.apache.org/jira/browse/AVRO-3380?focusedWorklogId=724969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-724969
 ]


ASF GitHub Bot logged work on AVRO-3380:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Feb/22 07:32
            Start Date: 11/Feb/22 07:32
    Worklog Time Spent: 10m 
      Work Description: jjaakola-aiven commented on a change in pull request 
#1529:
URL: https://github.com/apache/avro/pull/1529#discussion_r804408932



##########
File path: lang/py/avro/io.py
##########
@@ -663,63 +666,69 @@ def read_data(self, writers_schema: avro.schema.Schema, 
readers_schema: avro.sch
             # This shouldn't happen because of the match check at the start of 
this method.
             raise avro.errors.SchemaResolutionException("Schemas do not 
match.", writers_schema, readers_schema)
 
-        if writers_schema.type == "null":
-            return None
-        if writers_schema.type == "boolean":
-            return decoder.read_boolean()
-        if writers_schema.type == "string":
-            return decoder.read_utf8()
-        if writers_schema.type == "int":
-            if logical_type == avro.constants.DATE:
-                return decoder.read_date_from_int()
-            if logical_type == avro.constants.TIME_MILLIS:
-                return decoder.read_time_millis_from_int()
-            return decoder.read_int()
-        if writers_schema.type == "long":
-            if logical_type == avro.constants.TIME_MICROS:
-                return decoder.read_time_micros_from_long()
-            if logical_type == avro.constants.TIMESTAMP_MILLIS:
-                return decoder.read_timestamp_millis_from_long()
-            if logical_type == avro.constants.TIMESTAMP_MICROS:
-                return decoder.read_timestamp_micros_from_long()
-            return decoder.read_long()
-        if writers_schema.type == "float":
-            return decoder.read_float()
-        if writers_schema.type == "double":
-            return decoder.read_double()
-        if writers_schema.type == "bytes":
-            if logical_type == "decimal":
-                precision = writers_schema.get_prop("precision")
-                if not (isinstance(precision, int) and precision > 0):
-                    warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid 
decimal precision {precision}. Must be a positive integer."))
-                    return decoder.read_bytes()
-                scale = writers_schema.get_prop("scale")
-                if not (isinstance(scale, int) and scale > 0):
-                    warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid 
decimal scale {scale}. Must be a positive integer."))
-                    return decoder.read_bytes()
-                return decoder.read_decimal_from_bytes(precision, scale)
-            return decoder.read_bytes()
-        if isinstance(writers_schema, avro.schema.FixedSchema) and 
isinstance(readers_schema, avro.schema.FixedSchema):
-            if logical_type == "decimal":
-                precision = writers_schema.get_prop("precision")
-                if not (isinstance(precision, int) and precision > 0):
-                    warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid 
decimal precision {precision}. Must be a positive integer."))
-                    return self.read_fixed(writers_schema, readers_schema, 
decoder)
-                scale = writers_schema.get_prop("scale")
-                if not (isinstance(scale, int) and scale > 0):
-                    warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid 
decimal scale {scale}. Must be a positive integer."))
-                    return self.read_fixed(writers_schema, readers_schema, 
decoder)
-                return decoder.read_decimal_from_fixed(precision, scale, 
writers_schema.size)
-            return self.read_fixed(writers_schema, readers_schema, decoder)
-        if isinstance(writers_schema, avro.schema.EnumSchema) and 
isinstance(readers_schema, avro.schema.EnumSchema):
-            return self.read_enum(writers_schema, readers_schema, decoder)
-        if isinstance(writers_schema, avro.schema.ArraySchema) and 
isinstance(readers_schema, avro.schema.ArraySchema):
-            return self.read_array(writers_schema, readers_schema, decoder)
-        if isinstance(writers_schema, avro.schema.MapSchema) and 
isinstance(readers_schema, avro.schema.MapSchema):
-            return self.read_map(writers_schema, readers_schema, decoder)
-        if isinstance(writers_schema, avro.schema.RecordSchema) and 
isinstance(readers_schema, avro.schema.RecordSchema):
-            # .type in ["record", "error", "request"]:
-            return self.read_record(writers_schema, readers_schema, decoder)
+        try:
+            if writers_schema.type == "null":
+                return None
+            if writers_schema.type == "boolean":
+                return decoder.read_boolean()
+            if writers_schema.type == "string":
+                return decoder.read_utf8()
+            if writers_schema.type == "int":
+                if logical_type == avro.constants.DATE:
+                    return decoder.read_date_from_int()
+                if logical_type == avro.constants.TIME_MILLIS:
+                    return decoder.read_time_millis_from_int()
+                return decoder.read_int()
+            if writers_schema.type == "long":
+                if logical_type == avro.constants.TIME_MICROS:
+                    return decoder.read_time_micros_from_long()
+                if logical_type == avro.constants.TIMESTAMP_MILLIS:
+                    return decoder.read_timestamp_millis_from_long()
+                if logical_type == avro.constants.TIMESTAMP_MICROS:
+                    return decoder.read_timestamp_micros_from_long()
+                return decoder.read_long()
+            if writers_schema.type == "float":
+                return decoder.read_float()
+            if writers_schema.type == "double":
+                return decoder.read_double()
+            if writers_schema.type == "bytes":
+                if logical_type == "decimal":
+                    precision = writers_schema.get_prop("precision")
+                    if not (isinstance(precision, int) and precision > 0):
+                        warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid 
decimal precision {precision}. Must be a positive integer."))
+                        return decoder.read_bytes()
+                    scale = writers_schema.get_prop("scale")
+                    if not (isinstance(scale, int) and scale > 0):
+                        warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid 
decimal scale {scale}. Must be a positive integer."))
+                        return decoder.read_bytes()
+                    return decoder.read_decimal_from_bytes(precision, scale)
+                return decoder.read_bytes()
+            if isinstance(writers_schema, avro.schema.FixedSchema) and 
isinstance(readers_schema, avro.schema.FixedSchema):
+                if logical_type == "decimal":
+                    precision = writers_schema.get_prop("precision")
+                    if not (isinstance(precision, int) and precision > 0):
+                        warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid 
decimal precision {precision}. Must be a positive integer."))
+                        return self.read_fixed(writers_schema, readers_schema, 
decoder)
+                    scale = writers_schema.get_prop("scale")
+                    if not (isinstance(scale, int) and scale > 0):
+                        warnings.warn(avro.errors.IgnoredLogicalType(f"Invalid 
decimal scale {scale}. Must be a positive integer."))
+                        return self.read_fixed(writers_schema, readers_schema, 
decoder)
+                    return decoder.read_decimal_from_fixed(precision, scale, 
writers_schema.size)
+                return self.read_fixed(writers_schema, readers_schema, decoder)
+            if isinstance(writers_schema, avro.schema.EnumSchema) and 
isinstance(readers_schema, avro.schema.EnumSchema):
+                return self.read_enum(writers_schema, readers_schema, decoder)
+            if isinstance(writers_schema, avro.schema.ArraySchema) and 
isinstance(readers_schema, avro.schema.ArraySchema):
+                return self.read_array(writers_schema, readers_schema, decoder)
+            if isinstance(writers_schema, avro.schema.MapSchema) and 
isinstance(readers_schema, avro.schema.MapSchema):
+                return self.read_map(writers_schema, readers_schema, decoder)
+            if isinstance(writers_schema, avro.schema.RecordSchema) and 
isinstance(readers_schema, avro.schema.RecordSchema):
+                # .type in ["record", "error", "request"]:
+                return self.read_record(writers_schema, readers_schema, 
decoder)
+        except avro.errors.InvalidBytesRead as e:
+            decoder.reader.seek(0)

Review comment:
       The proposed idea here is to map the specific read error to something 
that helps out when encoding is off. Logging the data and schema used would 
help finding the issue.
   But I'll remove this. I think logging the data is wrong, as there are 
environments where data in the datum is sensitive.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 724969)
    Time Spent: 1h 40m  (was: 1.5h)

> Byte reading in avro.io does not assert read bytes to requested nbytes
> ----------------------------------------------------------------------
>
>                 Key: AVRO-3380
>                 URL: https://issues.apache.org/jira/browse/AVRO-3380
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.11.0
>            Reporter: Jarkko Jaakola
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The Python 3 compatibility layer in version 1.10.0 asserted the number of 
> read bytes to match the requested number.
> In version 1.11.0 the read returns what is available and just progresses. 
> This can be problem when having incompatible schemas or some other unexpected 
> condition.
> 1.10.0 implementation: 
> [https://github.com/apache/avro/blob/release-1.10.0/lang/py3/avro/io.py#L158]
> 1.11.0 implementation: 
> [https://github.com/apache/avro/blob/443614c12a15bb58fcf2487eb67ca6f885a68f96/lang/py/avro/io.py#L225]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (AVRO-3380) Byte reading in avro.io does not assert read bytes to requested nbytes

Reply via email to