[jira] [Commented] (AVRO-3261) s3 byte range for just schema or row count

Michael A. Smith (Jira) Sat, 02 Apr 2022 14:30:07 -0700


    [ 
https://issues.apache.org/jira/browse/AVRO-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516399#comment-17516399
 ]


Michael A. Smith commented on AVRO-3261:
----------------------------------------

If your avro file is an [object container 
file|https://avro.apache.org/docs/1.11.0/spec.html#Object+Container+Files], 
then to know how large the header is in bytes, we'd need to know:
 # The compression codec used
 # The size of the schema
 # The size of any other arbitrary metadata

It doesn't seem like we have enough guarantees in the avro spec to be able to 
know this information, so I'm not sure how we could go about accomplishing it.

> s3 byte range for just schema or row count
> ------------------------------------------
>
>                 Key: AVRO-3261
>                 URL: https://issues.apache.org/jira/browse/AVRO-3261
>             Project: Apache Avro
>          Issue Type: Wish
>          Components: community, misc, python, tools
>    Affects Versions: 1.11.0
>            Reporter: t oo
>            Priority: Major
>
> boto3 python library has [s3 
> get_object|https://boto3.amazonaws.com/v1/documentation/api/1.21.32/reference/services/s3.html#S3.Client.get_object]
>  that accepts a byte range (to avoid downloading a whole file but just 
> download selected byte range). if my Avro file is 100mb, can Avro library do 
> some byte range seek to only download the s3 object partially (ie the parts 
> that contain header.schema metadata and row count)?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (AVRO-3261) s3 byte range for just schema or row count

Reply via email to