wgtmac commented on PR #35825: URL: https://github.com/apache/arrow/pull/35825#issuecomment-1594105093
Thanks for bearing with me. IMO, the test needs to be improved to cover more cases: - Data type: at least `string` and `list<string>` need to be covered. - Encoding: dictionary-encoded and plain-encoded. Please make sure both values of `ArrowReaderProperties.read_dictionary` are tested. For example, we need to make sure dictionary-encoded values can be read via encoded or decoded form of arrow arrays. Same for plain-encoded case. - Read both overflow and non-overflow cases with `use_large_binary_variant` = true. It would be good to also add a test to make sure it throws in the overflow case when `use_large_binary_variant` = false If building a roundtrip is not that easy, we can add a test file to parquet-testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
