coderfender commented on code in PR #3558:
URL: https://github.com/apache/datafusion-comet/pull/3558#discussion_r2843211792


##########
spark/src/main/scala/org/apache/comet/iceberg/IcebergReflection.scala:
##########
@@ -19,8 +19,89 @@
 
 package org.apache.comet.iceberg
 
+import java.lang.reflect.Method
+
 import org.apache.spark.internal.Logging
 
+/** Cached Iceberg classes and methods to avoid repeated reflection lookups. */
+case class ReflectionCache(
+    contentScanTaskClass: Class[_],
+    fileScanTaskClass: Class[_],
+    contentFileClass: Class[_],
+    deleteFileClass: Class[_],
+    schemaParserClass: Class[_],
+    schemaClass: Class[_],
+    partitionSpecParserClass: Class[_],
+    partitionSpecClass: Class[_],
+    fileMethod: Method,
+    startMethod: Method,
+    lengthMethod: Method,
+    partitionMethod: Method,
+    residualMethod: Method,
+    taskSchemaMethod: Method,
+    deletesMethod: Method,
+    specMethod: Method,
+    schemaToJsonMethod: Method,
+    specToJsonMethod: Method,
+    deleteContentMethod: Method,
+    deleteSpecIdMethod: Method,
+    deleteEqualityIdsMethod: Method)

Review Comment:
   I am not sure if this _cache_  is truly helpful.  Here are my concerns 
   1. We have ~ 20 + fields making it difficult to maintain and manage
   2. We are sometimes caching methods vs classes . Can we be consistent and 
perhaps logically bucket this ?
   3. What happens if a Class or method not is not found ? Would we rather 
throw an error or fail silently ? What do you think is the path of least 
resistance here ?
   4. Why did we extends Logging for the companion object ? 
   5. Why factory instead of singleton approach ? Perhaps a single instance is 
useful across ?
   6. Can we guarantee that this is thread safe and only populated when iceberg 
is involved
   7. Are we catering  / handling all supported iceberg versions  and their 
signatures?
   8. nit : might also want to rename class to indicate that it is in the 
Iceberg subsystem



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to