[ 
https://issues.apache.org/jira/browse/BEAM-14304?focusedWorklogId=762693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762693
 ]

ASF GitHub Bot logged work on BEAM-14304:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Apr/22 04:07
            Start Date: 27/Apr/22 04:07
    Worklog Time Spent: 10m 
      Work Description: lostluck commented on code in PR #17347:
URL: https://github.com/apache/beam/pull/17347#discussion_r859359352


##########
sdks/go/pkg/beam/io/parquetio/parquetio.go:
##########
@@ -118,6 +122,19 @@ func (a *parquetReadFn) ProcessElement(ctx 
context.Context, filename string, emi
        return nil
 }
 
+// Write writes a PCollection<parquetStruct> to .parquet file.
+// Write expects a type t of struct with parquet tags
+// For example:
+// type Student struct {
+//   Name    string  `parquet:"name=name, type=BYTE_ARRAY, convertedtype=UTF8, 
encoding=PLAIN_DICTIONARY"`
+//   Age     int32   `parquet:"name=age, type=INT32, encoding=PLAIN"`
+//   Id      int64   `parquet:"name=id, type=INT64"`
+//   Weight  float32 `parquet:"name=weight, type=FLOAT"`
+//   Sex     bool    `parquet:"name=sex, type=BOOLEAN"`
+//   Day     int32   `parquet:"name=day, type=INT32, convertedtype=DATE"`
+//   Ignored int32   //without parquet tag and won't write
+// }
+

Review Comment:
   Ah, the blank line prevents associating the comment with the function. 
   
   The PR doesn't have "allow committers to make changes" set otherwise I'd 
commit the change and then merge the PR.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 762693)
    Time Spent: 1h 20m  (was: 1h 10m)

> Implement parquetio for Go SDK
> ------------------------------
>
>                 Key: BEAM-14304
>                 URL: https://issues.apache.org/jira/browse/BEAM-14304
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-go
>            Reporter: Nguyen Khoi Nguyen
>            Priority: P2
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The naive approach would be reading the whole parquet file into memory, 
> because processing parquet files requires io.Seeker
> Or implement filesystem.go Interface to return io.ReadSeekCloser, but it 
> would not be trivial for gcs



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to