[
https://issues.apache.org/jira/browse/ARROW-15875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506271#comment-17506271
]
Dewey Dunnington commented on ARROW-15875:
------------------------------------------
It looks like it's accessible via {{RandomAccessFile::ReadMetadata()}}, which
isn't too bad to implement but doesn't expose it via {{GetFileInfo()}} as Carl
suggested. Would exposing the {{ReadMetadata()}} method be enough for your
use-case?
{code:R}
# remotes::install_github("paleolimbot/arrow/r@r-file-metadata")
library(arrow, warn.conflicts = FALSE)
bucket <- s3_bucket("ursa-labs-taxi-data")
file <- bucket$OpenInputFile("2019/06/data.parquet")
file$ReadMetadata()
#> $`Content-Length`
#> [1] "120790979"
#>
#> $`Content-Type`
#> [1] "application/x-www-form-urlencoded; charset=utf-8"
#>
#> $ETag
#> [1] "\"f1efd5d76cb82861e1542117bfa52b90-8\""
#>
#> $`Last-Modified`
#> [1] "2020-01-17T16:26:28Z"
{code}
(see
https://github.com/apache/arrow/compare/master...paleolimbot:r-file-metadata )
> [R][C++] Include md5sum in S3 method for GetFileInfo()
> ------------------------------------------------------
>
> Key: ARROW-15875
> URL: https://issues.apache.org/jira/browse/ARROW-15875
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, R
> Affects Versions: 7.0.0
> Reporter: Carl Boettiger
> Priority: Major
>
> GetFileInfo() seems to include mtime, size, path and type. For an S3 system,
> it would be nice to be able to reference the md5 sum without transferring the
> file, (which I think the server will have already computed?). This seems
> like the logical place to include it (though I wouldn't object to a more
> visible method too).
>
>
> (though type isn't clear to me, since it appears to be an integer)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)