[
https://issues.apache.org/jira/browse/BEAM-14304?focusedWorklogId=761569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-761569
]
ASF GitHub Bot logged work on BEAM-14304:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 25/Apr/22 03:03
Start Date: 25/Apr/22 03:03
Worklog Time Spent: 10m
Work Description: nguyennk92 commented on PR #17347:
URL: https://github.com/apache/beam/pull/17347#issuecomment-1108017052
> Expanding the filesystem to support a *Seeker variant for reading is a
great idea. We might not always be able to get an efficient implementation
(it'll depend on the underlying filesystem), but we can always implement the
inefficient approach (actually reading the data to get to the desired spot) as
a stop gap if needed.
Yes, I'm working on that idea, basically the only obstacle is having `gcs`
to return a `io.ReadCloseSeeker` for `OpenRead`. I kinda have an "inefficient
solution". I will make another PR for that alone, as it was not trivial I
think. I'm reading the Java version for ideas to implement a Seeker variant
with more efficiency
Issue Time Tracking
-------------------
Worklog Id: (was: 761569)
Time Spent: 1h (was: 50m)
> Implement parquetio for Go SDK
> ------------------------------
>
> Key: BEAM-14304
> URL: https://issues.apache.org/jira/browse/BEAM-14304
> Project: Beam
> Issue Type: New Feature
> Components: sdk-go
> Reporter: Nguyen Khoi Nguyen
> Priority: P2
> Time Spent: 1h
> Remaining Estimate: 0h
>
> The naive approach would be reading the whole parquet file into memory,
> because processing parquet files requires io.Seeker
> Or implement filesystem.go Interface to return io.ReadSeekCloser, but it
> would not be trivial for gcs
--
This message was sent by Atlassian Jira
(v8.20.7#820007)