[
https://issues.apache.org/jira/browse/AVRO-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516399#comment-17516399
]
Michael A. Smith commented on AVRO-3261:
----------------------------------------
If your avro file is an [object container
file|https://avro.apache.org/docs/1.11.0/spec.html#Object+Container+Files],
then to know how large the header is in bytes, we'd need to know:
# The compression codec used
# The size of the schema
# The size of any other arbitrary metadata
It doesn't seem like we have enough guarantees in the avro spec to be able to
know this information, so I'm not sure how we could go about accomplishing it.
> s3 byte range for just schema or row count
> ------------------------------------------
>
> Key: AVRO-3261
> URL: https://issues.apache.org/jira/browse/AVRO-3261
> Project: Apache Avro
> Issue Type: Wish
> Components: community, misc, python, tools
> Affects Versions: 1.11.0
> Reporter: t oo
> Priority: Major
>
> boto3 python library has [s3
> get_object|https://boto3.amazonaws.com/v1/documentation/api/1.21.32/reference/services/s3.html#S3.Client.get_object]
> that accepts a byte range (to avoid downloading a whole file but just
> download selected byte range). if my Avro file is 100mb, can Avro library do
> some byte range seek to only download the s3 object partially (ie the parts
> that contain header.schema metadata and row count)?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)