tanmayrauth commented on code in PR #1113:
URL: https://github.com/apache/iceberg-go/pull/1113#discussion_r3291925524
##########
table/arrow_utils.go:
##########
@@ -1704,3 +1714,43 @@ func positionDeleteRecordsToDataFiles(ctx
context.Context, rootLocation string,
return partitionWriter.Write(ctx, workers)
}
+
+func positionDeleteRecordsToDataFilesDV(ctx context.Context, rootLocation
string, args recordWritingArgs) iter.Seq2[iceberg.DataFile, error] {
+ return func(yield func(iceberg.DataFile, error) bool) {
+ writer := dv.NewDVWriter(args.fs)
+
+ for batch, err := range args.itr {
+ if err != nil {
+ yield(nil, err)
+
+ return
+ }
+
+ filePaths := batch.Column(0).(*array.String)
+ positions := batch.Column(1).(*array.Int64)
+
+ for i := range batch.NumRows() {
+ writer.Add(filePaths.Value(int(i)),
[]int64{positions.Value(int(i))})
+ }
+ }
+
+ u := uuid.Must(uuid.NewRandom())
+ if args.writeUUID != nil {
+ u = *args.writeUUID
+ }
+ location := rootLocation + "/data/" +
fmt.Sprintf("dv-%s.puffin", u)
+
+ dataFiles, err := writer.Flush(ctx, location)
+ if err != nil {
+ yield(nil, err)
+
+ return
+ }
Review Comment:
Restructured. The DV producer now drains the iterator (early-return on
iterator error), returns early if no positions were added (no UUID, no
location, no Flush), then generates the UUID and location and calls Flush, with
an early-return on Flush error before any successful yield. UUID and location
work is fully skipped on the empty path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]