NGA-TRAN commented on code in PR #20246:
URL: https://github.com/apache/datafusion/pull/20246#discussion_r2785250661


##########
datafusion/sqllogictest/test_files/preserve_file_partitioning.slt:
##########
@@ -101,6 +101,29 @@ STORED AS PARQUET;
 ----
 4
 
+# Create hive-partitioned dimension table (3 partitions matching fact_table)
+# For testing Partitioned joins with matching partition counts
+query I
+COPY (SELECT 'dev' as env, 'log' as service)
+TO 
'test_files/scratch/preserve_file_partitioning/dimension_partitioned/d_dkey=A/data.parquet'
+STORED AS PARQUET;
+----
+1
+
+query I
+COPY (SELECT 'prod' as env, 'log' as service)
+TO 
'test_files/scratch/preserve_file_partitioning/dimension_partitioned/d_dkey=B/data.parquet'
+STORED AS PARQUET;
+----
+1
+
+query I
+COPY (SELECT 'prod' as env, 'log' as service)

Review Comment:
   So `d_dkey=C` and `d_dkey=B` include exactly the same data `env=prod` and  
`service=log`.
   
   I see now. A new column `d_dkey` is automatically added with the value B and 
C. It does not matter. The results is still correct.
   But do you want that? Will it be clearer if they are different values. It is 
up to you. I do not have concern here though



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to