[
https://issues.apache.org/jira/browse/BEAM-14304?focusedWorklogId=762692&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762692
]
ASF GitHub Bot logged work on BEAM-14304:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Apr/22 04:05
Start Date: 27/Apr/22 04:05
Worklog Time Spent: 10m
Work Description: lostluck commented on code in PR #17347:
URL: https://github.com/apache/beam/pull/17347#discussion_r859358611
##########
sdks/go/pkg/beam/io/parquetio/parquetio.go:
##########
@@ -118,6 +122,19 @@ func (a *parquetReadFn) ProcessElement(ctx
context.Context, filename string, emi
return nil
}
+// Write writes a PCollection<parquetStruct> to .parquet file.
+// Write expects a type t of struct with parquet tags
+// For example:
+// type Student struct {
+// Name string `parquet:"name=name, type=BYTE_ARRAY, convertedtype=UTF8,
encoding=PLAIN_DICTIONARY"`
+// Age int32 `parquet:"name=age, type=INT32, encoding=PLAIN"`
+// Id int64 `parquet:"name=id, type=INT64"`
+// Weight float32 `parquet:"name=weight, type=FLOAT"`
+// Sex bool `parquet:"name=sex, type=BOOLEAN"`
+// Day int32 `parquet:"name=day, type=INT32, convertedtype=DATE"`
+// Ignored int32 //without parquet tag and won't write
+// }
+
Review Comment:
```suggestion
// }
```
Issue Time Tracking
-------------------
Worklog Id: (was: 762692)
Time Spent: 1h 10m (was: 1h)
> Implement parquetio for Go SDK
> ------------------------------
>
> Key: BEAM-14304
> URL: https://issues.apache.org/jira/browse/BEAM-14304
> Project: Beam
> Issue Type: New Feature
> Components: sdk-go
> Reporter: Nguyen Khoi Nguyen
> Priority: P2
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> The naive approach would be reading the whole parquet file into memory,
> because processing parquet files requires io.Seeker
> Or implement filesystem.go Interface to return io.ReadSeekCloser, but it
> would not be trivial for gcs
--
This message was sent by Atlassian Jira
(v8.20.7#820007)