paleolimbot commented on code in PR #36992:
URL: https://github.com/apache/arrow/pull/36992#discussion_r1290485420


##########
r/tests/testthat/test-parquet.R:
##########
@@ -484,3 +484,43 @@ test_that("Can read Parquet files from a URL", {
   expect_true(tibble::is_tibble(pu))
   expect_identical(dim(pu), c(10L, 11L))
 })
+
+test_that("thrift string and container size can be specified when reading 
Parquet files", {
+
+  tf <- tempfile()
+  on.exit(unlink(tf))
+  table <- arrow_table(example_data)
+  write_parquet(table, tf)
+  file <- make_readable_file(tf)
+  on.exit(file$close())
+
+  # thrift string size
+  reader_props <- ParquetReaderProperties$create()
+  reader_props$set_thrift_string_size_limit(1)
+  expect_identical(reader_props$thrift_string_size_limit(), 1L)
+
+  # We get an error if we set the Thrift string size limit too small
+  expect_error(ParquetFileReader$create(file, reader_props = reader_props), 
"TProtocolException: Exceeded size limit")
+
+  # Increase the size and we can read successfully
+  reader_props$set_thrift_string_size_limit(10000)
+  reader <- ParquetFileReader$create(file, reader_props = reader_props)
+  data <- reader$ReadTable()
+  expect_identical(collect.ArrowTabular(data), example_data)

Review Comment:
   That's a good point...we don't have a separate code path that an error would 
trigger (there are some places where we use the fact that an error was 
generated to do something else, so to get 100% test coverage we need to trigger 
an error). 
   
   I think it's reasonable to add that test here because it verifies that the 
changes in this PR do, in fact fix the user issue, and, in the dataset case, it 
makes sure that the value actually makes it from `open_dataset()` to its final 
C++ home. You could also make the argument that we're testing C++ and so we 
should remove that test...I think either are fine (but `expect_error()` without 
a regex or class argument is not!).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to