Re: [PR] chore: Log warning for tests that are not really testing dictionary-encoded Parquet files [datafusion-comet]

via GitHub Wed, 18 Sep 2024 14:04:34 -0700


comphead commented on code in PR #752:
URL: https://github.com/apache/datafusion-comet/pull/752#discussion_r1765720506



##########
spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala:
##########
@@ -377,6 +382,36 @@ abstract class CometTestBase
         .write
         .option("parquet.enable.dictionary", withDictionary.toString)
         .parquet(file.getCanonicalPath)
+
+      if (withDictionary) {
+        // if the test specified to write dictionary-encoded data, we should 
check that we actually wrote some
+        // dictionary-encoded data
+        val files = file.listFiles(new FilenameFilter {

Review Comment:
   a nit: I feel this code checks if any column has RLE encoding, however RLE 
is on row group level? that means every page of the every column may or may not 
encoded. If lets say column C1 encoded, C2 nope, than no warning shown but we 
use exactly C2 for testing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] chore: Log warning for tests that are not really testing dictionary-encoded Parquet files [datafusion-comet]

Reply via email to