[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2885: Add support for correlated subqueries & fix all related TPC-H benchmark issues

GitBox Wed, 20 Jul 2022 10:06:49 -0700


avantgardnerio commented on code in PR #2885:
URL: https://github.com/apache/arrow-datafusion/pull/2885#discussion_r925850259



##########
datafusion/core/tests/sql/mod.rs:
##########
@@ -499,6 +537,77 @@ async fn register_tpch_csv(ctx: &SessionContext, table: 
&str) -> Result<()> {
     Ok(())
 }
 
+async fn register_tpch_csv_data(
+    ctx: &SessionContext,
+    table_name: &str,
+    data: &str,
+) -> Result<()> {
+    let schema = Arc::new(get_tpch_table_schema(table_name));

Review Comment:
   I started with the TPC-H `.csv`s that were checked in already, and even 
added some of my own, then started adding data to existing ones, then updated 
failed tests with new expected results, then realized I'd had the feeling 
before where I'm going down the road hell that is shared test data. I see it 
being particularly bad for aggregates (what's the proper expected result for 
the sum of all sales with parts from the middle east?)
   
   I started to break out `csv`s by folder, but that seemed cumbersome, so 
finally I thought that this function might be a useful tool to keep the data 
close to the test itself, like this:
   
   
https://github.com/spaceandtimelabs/arrow-datafusion/blob/563d87d1c6413e611894619c2bc472b396d75c3d/datafusion/core/tests/sql/subqueries.rs#L171
   
   I don't have strong opinions as to implementation, but I would like to avoid 
sharing one set of data between all integration tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2885: Add support for correlated subqueries & fix all related TPC-H benchmark issues

Reply via email to