zeroshade commented on code in PR #14989:
URL: https://github.com/apache/arrow/pull/14989#discussion_r1084714563


##########
go/parquet/file/column_writer_types.gen.go.tmpl:
##########
@@ -141,9 +141,9 @@ func (w *{{.Name}}ColumnChunkWriter) 
WriteBatchSpaced(values []{{.name}}, defLev
     }
 
     if w.bitsBuffer != nil {
-      w.writeValuesSpaced(vals, info.batchNum, w.bitsBuffer.Bytes(), 0)
+      w.writeValuesSpaced(vals, batch, w.bitsBuffer.Bytes(), 0)
     } else {
-      w.writeValuesSpaced(vals, info.batchNum, validBits, 
validBitsOffset+valueOffset)
+      w.writeValuesSpaced(vals, batch, validBits, validBitsOffset+valueOffset)

Review Comment:
   Looks like there's a reason I did it this way in the first place, haha!
   
   So, there's probably a more elegant solution, but for a quick fix i was able 
to solve the failed tests by passing *both* `batch` and `info.batchNum` and 
then modifying `writeValuesSpaced` to take in both, with `batch` being the 
current `numValues` and `info.batchNum` (probably should rename this, but not 
necessary for this PR) being passed as `numRead`.  The new `writeValuesSpaced` 
looked like so:
   
   ```go
   func (w *{{.Name}}ColumnChunkWriter) writeValuesSpaced(spacedValues 
[]{{.name}}, numRead, numValues int64, validBits []byte, validBitsOffset int64) 
{
   if len(spacedValues) != int(numRead) {
       w.currentEncoder.(encoding.{{.Name}}Encoder).PutSpaced(spacedValues, 
validBits, validBitsOffset)
     } else {
       w.currentEncoder.(encoding.{{.Name}}Encoder).Put(spacedValues)
     }
     if w.pageStatistics != nil {
       nulls := int64(len(spacedValues)) - numValues
       
w.pageStatistics.(*metadata.{{.Name}}Statistics).UpdateSpaced(spacedValues, 
validBits, validBitsOffset, nulls)
     }
   }
   ```
   
   The issue was that since `len(spacedValues) == int(numRead)` it wasn't 
calling `PutSpaced` when it should be (which will remove the null values 
according to the bitmap and only write the non-null values as per the parquet 
spec).
   
   *sigh* There's gotta be a more elegant way to do this, but at least this 
works.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to