zeroshade commented on a change in pull request #11538:
URL: https://github.com/apache/arrow/pull/11538#discussion_r783244925
##########
File path: go/parquet/file/column_writer_test.go
##########
@@ -45,6 +49,73 @@ const (
DictionaryPageSize = 1024 * 1024
)
+type mockpagewriter struct {
+ mock.Mock
+}
+
+func (m *mockpagewriter) Close(hasDict, fallBack bool) error {
+ return m.Called(hasDict, fallBack).Error(0)
+}
+func (m *mockpagewriter) WriteDataPage(page file.DataPage) (int64, error) {
+ args := m.Called(page)
+ return int64(args.Int(0)), args.Error(1)
+}
+func (m *mockpagewriter) WriteDictionaryPage(page *file.DictionaryPage)
(int64, error) {
+ args := m.Called(page)
+ return int64(args.Int(0)), args.Error(1)
+}
+func (m *mockpagewriter) HasCompressor() bool {
+ return m.Called().Bool(0)
+}
+func (m *mockpagewriter) Compress(buf *bytes.Buffer, src []byte) []byte {
+ return m.Called(buf, src).Get(0).([]byte)
+}
+func (m *mockpagewriter) Reset(sink utils.WriterTell, codec
compress.Compression, compressionLevel int, metadata
*metadata.ColumnChunkMetaDataBuilder, rgOrdinal, columnOrdinal int16,
metaEncryptor, dataEncryptor encryption.Encryptor) error {
+ return m.Called().Error(0)
+}
+
+func TestWriteDataPageV2NumRows(t *testing.T) {
Review comment:
Ok cool, I solved it by mucking with the batching logic to ensure that
the batches in `WriteBatch` always begin at a row boundary (writing slightly
smaller batches if necessary). this lets everything else stay the same since as
long as I guarantee that the batch I write starts with a new row, and we only
check to write the page after writing a batch, it guarantees that we never have
a row crossing page boundaries.
the only drawback is that doing it this way means that if you are writing a
repeated column with DataPageV2, `WriteBatch` must always start with a new row.
that keeps the guarantee. I'm ok with that though and I've updated the
documentation accordingly and added a test to ensure that batching makes sure
that rows don't cross page boundaries.
Assuming nothing goes wrong with the tests in the CI, i'll merge this after
everything passes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]