daniel-adam-tfs commented on code in PR #477: URL: https://github.com/apache/arrow-go/pull/477#discussion_r2284898520
########## parquet/file/file_reader_test.go: ########## @@ -927,3 +928,61 @@ func TestListColumns(t *testing.T) { } } } + +func BenchmarkReadInt32Column(b *testing.B) { + b.Skip("rle-dict-int32-snappy.parquet not available") + + dir := os.Getenv("PARQUET_TEST_DATA") + if dir == "" { + dir = "../../parquet-testing/data" + b.Log("PARQUET_TEST_DATA not set, using ../../parquet-testing/data") + } + require.DirExists(b, dir) + + filePath := filepath.Join(dir, "rle-dict-int32-snappy.parquet") + reader, err := file.OpenParquetFile(filePath, false) + if err != nil { + b.Fatalf("Expected no error while opening parquet file %q, got %v", filePath, err) + } + defer reader.Close() + + int32ColIdx := reader.MetaData().Schema.Root().FieldIndexByName("int32") + if int32ColIdx < 0 { + b.Fatalf("Expected to find int32 column in schema, got index %d", int32ColIdx) + } + + numValues := reader.NumRows() + values := make([]int32, numValues) + b.StopTimer() Review Comment: I definitely shouldn't have 3 calls to `StopTimer` in this benchmark, I think I forgot to remove the one on 961. The timer is in `On` state as this benchmark runs and `ResetTimer` doesn't stop it. I want to measure just the `ReadBatch` call, so I stop it to avoid including execution of lines 958-971 into the benchmark timing. Or do you think we should include it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org