coderfender commented on code in PR #3558: URL: https://github.com/apache/datafusion-comet/pull/3558#discussion_r2843211792
########## spark/src/main/scala/org/apache/comet/iceberg/IcebergReflection.scala: ########## @@ -19,8 +19,89 @@ package org.apache.comet.iceberg +import java.lang.reflect.Method + import org.apache.spark.internal.Logging +/** Cached Iceberg classes and methods to avoid repeated reflection lookups. */ +case class ReflectionCache( + contentScanTaskClass: Class[_], + fileScanTaskClass: Class[_], + contentFileClass: Class[_], + deleteFileClass: Class[_], + schemaParserClass: Class[_], + schemaClass: Class[_], + partitionSpecParserClass: Class[_], + partitionSpecClass: Class[_], + fileMethod: Method, + startMethod: Method, + lengthMethod: Method, + partitionMethod: Method, + residualMethod: Method, + taskSchemaMethod: Method, + deletesMethod: Method, + specMethod: Method, + schemaToJsonMethod: Method, + specToJsonMethod: Method, + deleteContentMethod: Method, + deleteSpecIdMethod: Method, + deleteEqualityIdsMethod: Method) Review Comment: I am not sure if this _cache_ is truly helpful or perhaps if the reward is worth the complexity being introduced. Here are my concerns and I would love to know your opinion 1. We have ~ 20 + fields making it difficult to maintain and manage 2. We are sometimes caching methods vs classes . Can we be consistent and perhaps logically bucket this ? 3. What happens if a Class or method not is not found ? Would we rather throw an error or fail silently ? What do you think is the path of least resistance here ? 4. Why did we extends Logging for the companion object ? 5. Why factory instead of singleton approach ? Perhaps a single instance is useful across ? 6. Can we guarantee that this is thread safe and only populated when iceberg is involved 7. Are we catering / handling all supported iceberg versions and their signatures? 8. nit : might also want to rename class to indicate that it is in the Iceberg subsystem -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
