[
https://issues.apache.org/jira/browse/BEAM-14304?focusedWorklogId=762693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762693
]
ASF GitHub Bot logged work on BEAM-14304:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Apr/22 04:07
Start Date: 27/Apr/22 04:07
Worklog Time Spent: 10m
Work Description: lostluck commented on code in PR #17347:
URL: https://github.com/apache/beam/pull/17347#discussion_r859359352
##########
sdks/go/pkg/beam/io/parquetio/parquetio.go:
##########
@@ -118,6 +122,19 @@ func (a *parquetReadFn) ProcessElement(ctx
context.Context, filename string, emi
return nil
}
+// Write writes a PCollection<parquetStruct> to .parquet file.
+// Write expects a type t of struct with parquet tags
+// For example:
+// type Student struct {
+// Name string `parquet:"name=name, type=BYTE_ARRAY, convertedtype=UTF8,
encoding=PLAIN_DICTIONARY"`
+// Age int32 `parquet:"name=age, type=INT32, encoding=PLAIN"`
+// Id int64 `parquet:"name=id, type=INT64"`
+// Weight float32 `parquet:"name=weight, type=FLOAT"`
+// Sex bool `parquet:"name=sex, type=BOOLEAN"`
+// Day int32 `parquet:"name=day, type=INT32, convertedtype=DATE"`
+// Ignored int32 //without parquet tag and won't write
+// }
+
Review Comment:
Ah, the blank line prevents associating the comment with the function.
The PR doesn't have "allow committers to make changes" set otherwise I'd
commit the change and then merge the PR.
Issue Time Tracking
-------------------
Worklog Id: (was: 762693)
Time Spent: 1h 20m (was: 1h 10m)
> Implement parquetio for Go SDK
> ------------------------------
>
> Key: BEAM-14304
> URL: https://issues.apache.org/jira/browse/BEAM-14304
> Project: Beam
> Issue Type: New Feature
> Components: sdk-go
> Reporter: Nguyen Khoi Nguyen
> Priority: P2
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> The naive approach would be reading the whole parquet file into memory,
> because processing parquet files requires io.Seeker
> Or implement filesystem.go Interface to return io.ReadSeekCloser, but it
> would not be trivial for gcs
--
This message was sent by Atlassian Jira
(v8.20.7#820007)