Hi everyone, I've encountered a performance issue at multiple customers now. The problem is the processing of input paths when there are lots of partitions.
We check each directory if it's empty. This alone can take minutes. There is a comment in Utilities: "We need to add a empty file, it is not acceptable to change the operator tree Consider the query: select * from (select count(1) from T union all select count(1) from T2) x; If T is empty and T2 contains 100 rows, the user expects: 0, 100 (2 rows)" I have to admit that I don't quite understand that. Would it mean that we'd only get a single row if we left out this empty path? I do not understand the internals of query planning and execution well enough but if someone has time to explain it to me I'd be very grateful. (If someone who understands all of this is based in Europe I'd be more than happy to jump on a call as well or invite to a steak & beer ;-) ) If that is indeed the reason: This code was written 3 or 4 years ago. Maybe internals have changed enough so that we can now deal with this? For simple queries like SELECT * FROM T LIMIT 10 I'm seeing 5-10min runtimes just because of this overhead. Thanks! Lars