Hi Experts, Could you please allow me to pick your brain on the following:
For Hive Tables ( managed), the scan operator is FileSourceScanExec. Is there any particular reason why its underlying HadoopFSRelations' field, FileFormat does not implement an interface like SupportsRuntimeFiltering ? Like Scan contained in BatchScanExec, FileSourceScanExec may also benefit from pushdown of run time filters in skipping chunks white reading say Parquet Format? The reason for my asking is that I have been working ,personally, on pushdown of BrodacastHashJoin's buildside set (converted to SortedSet) and pushed as a Runtime Filter to iceberg as Scan DataSource layer , for filtering at various stages ( something akin to DPP but for non partitioned columns) , (https://github.com/apache/spark/pull/49209 ). I am thinking of doing the same for Hive based relations , using Parquet ( for starts). I believe parquet has max/min data available per chunk , and want to utilize it for pruning. I know that it works fine for iceberg formatted data, and was wondering if you see any issue in doing the same for FileSourceScanExec with parquet format data? Regards Asif