paleolimbot commented on code in PR #36992:
URL: https://github.com/apache/arrow/pull/36992#discussion_r1290485420
##########
r/tests/testthat/test-parquet.R:
##########
@@ -484,3 +484,43 @@ test_that("Can read Parquet files from a URL", {
expect_true(tibble::is_tibble(pu))
expect_identical(dim(pu), c(10L, 11L))
})
+
+test_that("thrift string and container size can be specified when reading
Parquet files", {
+
+ tf <- tempfile()
+ on.exit(unlink(tf))
+ table <- arrow_table(example_data)
+ write_parquet(table, tf)
+ file <- make_readable_file(tf)
+ on.exit(file$close())
+
+ # thrift string size
+ reader_props <- ParquetReaderProperties$create()
+ reader_props$set_thrift_string_size_limit(1)
+ expect_identical(reader_props$thrift_string_size_limit(), 1L)
+
+ # We get an error if we set the Thrift string size limit too small
+ expect_error(ParquetFileReader$create(file, reader_props = reader_props),
"TProtocolException: Exceeded size limit")
+
+ # Increase the size and we can read successfully
+ reader_props$set_thrift_string_size_limit(10000)
+ reader <- ParquetFileReader$create(file, reader_props = reader_props)
+ data <- reader$ReadTable()
+ expect_identical(collect.ArrowTabular(data), example_data)
Review Comment:
That's a good point...we don't have a separate code path that an error would
trigger (there are some places where we use the fact that an error was
generated to do something else, so to get 100% test coverage we need to trigger
an error).
I think it's reasonable to add that test here because it verifies that the
changes in this PR do, in fact fix the user issue, and, in the dataset case, it
makes sure that the value actually makes it from `open_dataset()` to its final
C++ home. You could also make the argument that we're testing C++ and so we
should remove that test...I think either are fine (but `expect_error()` without
a regex or class argument is not!).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]