Hi, I'm a support engineer, interested in DataSourceV2.
Recently I had some pain to troubleshoot to check if pushdown is actually applied or not. I noticed that DataFrame's explain() method shows pushdown even for JSON. It totally depends on DataSource side, I believe. However, I would like Spark to have some way to confirm whether specific pushdown is actually applied in DataSource or not. # Example val df = spark.read.json("s3://sample_bucket/people.json") df.printSchema() df.filter($"age" > 20).explain() root |-- age: long (nullable = true) |-- name: string (nullable = true) == Physical Plan == *Project [age#47L, name#48] +- *Filter (isnotnull(age#47L) && (age#47L > 20)) +- *FileScan json [age#47L,name#48] Batched: false, Format: JSON, Location: InMemoryFileIndex[s3://sample_bucket/people.json], PartitionFilters: [], PushedFilters: [IsNotNull(age), GreaterThan(age,20)], ReadSchema: struct<age:bigint,name:string> # Comments As you can see, PushedFilter is shown even if input data is JSON. Actually this pushdown is not used. I'm wondering if it has been already discussed or not. If not, this is a chance to have such feature in DataSourceV2 because it would require some API level changes. Warm regards, Noritaka Sekiyama