xiarixiaoyao edited a comment on pull request #5168:
URL: https://github.com/apache/hudi/pull/5168#issuecomment-1082534853
@alexeykudinkin addressed the comment, The code has been modified according
to your opinion
but i still have a I have a question, do we really need force set
spark.sql.parquet.recordLevelFilter.enabled=true to mor/cow snapshot query
test prepare
set spark.sql.parquet.enableVectorizedReader=false, since
spark.sql.parquet.recordLevelFilter.enabled is conflict with it.
here is the benchmark result and bench mark code:
```
prepareHoodieCowTable(tableName, new Path(f.getCanonicalPath,
tableName).toUri.toString)
val benchmark = new HoodieBenchmark("perf cow snapshot read",
1000000)
benchmark.addCase("recordLevelFilter enable") { _ =>
//
spark.sessionState.conf.setConfString("spark.sql.parquet.enableVectorizedReader",
"false")
spark.sessionState.conf.setConfString("spark.sql.parquet.filterPushdown",
"true")
spark.sessionState.conf.setConfString("spark.sql.parquet.recordLevelFilter.enabled",
"true")
spark.sql(s"select c1, c3, c4, c5 from $tableName").where("c1 >
100000").count()
}
benchmark.addCase("recordLevelFilter disable") { _ =>
spark.sessionState.conf.setConfString("spark.sql.parquet.filterPushdown",
"true")
spark.sessionState.conf.setConfString("spark.sql.parquet.recordLevelFilter.enabled",
"false")
spark.sql(s"select c1, c3, c4, c5 from $tableName").where("c1 >
100000").count()
}
perf cow snapshot read: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
recordLevelFilter enable 693 751
69 1.4 693.1 1.0X
recordLevelFilter disable 662 680
27 1.5 662.4 1.0X
```
I don't see any performance improvement by set
spark.sql.parquet.recordLevelFilter.enabled=true
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]