Hi All, Current parquet filters handles missing columns (that are not in the file) as if their values were all null. This is completely logical. The question is how shall parquet filtering handle the columns that are in the file (with real values) but missing in the projection. I've thought during the column indexes implementation that this situation is clear. The projection restricts the visible columns to the ones specified by the user so columns that are in the file but not in the projection shall be handled the same way as columns are not in the file. This is the way column index filtering is implemented. (To guarantee that only the correct records will be retrieved we need to read the columns in the filter to check the values one by one.) The problem is that the other filters (dictionary and statistics filter) do not care about the projection. Because of that parquet 1.11.0 introduced a regression in case of filtering on columns that are not in the projection (but are in the file). It think, column index filtering works correctly but I am curious about your opinions.
Thanks a lot, Gabor
