kbendick commented on a change in pull request #1429:
URL: https://github.com/apache/iceberg/pull/1429#discussion_r484591685



##########
File path: 
core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java
##########
@@ -75,7 +76,14 @@ public TableMetadata refresh() {
     try {
       doRefresh();
     } catch (NoSuchTableException e) {
-      LOG.warn("Could not find the table during refresh, setting current 
metadata to null", e);
+      // Adjust log level according to the type of exception, as it might be 
intentional to call
+      // the method without determining the type of table in prior (like 
SparkSessionCatalog),
+      // and in such case it may not be an error case if the table is not an 
Iceberg table.
+      if (e instanceof NoSuchIcebergTableException) {

Review comment:
       Would it be possible to add into the comment that in general 
`NoSuchTableException` is not necessarily unexpected, and that in the future we 
can consider lowering the log level for both? If possible, could you also put 
an example where you believe this might happen? 
   
   My thinking is that this way, instead of the knowledge being left here in 
the PR, it will remain in the codebase and be more likely to get looked at 
again in the future.

##########
File path: 
core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java
##########
@@ -75,7 +76,14 @@ public TableMetadata refresh() {
     try {
       doRefresh();
     } catch (NoSuchTableException e) {
-      LOG.warn("Could not find the table during refresh, setting current 
metadata to null", e);
+      // Adjust log level according to the type of exception, as it might be 
intentional to call
+      // the method without determining the type of table in prior (like 
SparkSessionCatalog),
+      // and in such case it may not be an error case if the table is not an 
Iceberg table.
+      if (e instanceof NoSuchIcebergTableException) {

Review comment:
       I'm also curious about the code flow here and hoping you can help me 
better understand it possibly @HeartSaVioR.
   
   I see that the call to `refresh` here calls the pr`doRefresh` which is not 
implemented in this abstract class `BaseMetastoreTableOperations`, and I can't 
find any corresponding code that does implement `doRefresh` which extends this 
class (or even anywhere). However, I see similar classes that inherit from the 
higher up `TableOperations` interface which this also inherits from which 
implements their own version of `refresh` which does not rely on this abstract 
class. The example I'm thinking of is  
[HadoopTableOperations](https://github.com/apache/iceberg/blob/c28d1c8ba6322af07f0206c4cf3fdad875f37ac1/core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java#L92).
   
   In any case, if this call to `doRefresh` throws a `NoSuchTableException`, 
currently it logs it, resets some of the private fields to null / their 
respective empty values (e.g. -1 for `version` in this abstract class when it's 
an int, though in `HadoopTableOperations` which implements `TableOperations` 
it's not an `int` but a `volatile Integer` whose empty value is `null`). 
   
   I guess my question is, I can't quite see where this call to this particular 
version of `refresh` is being accessed. When do we use 
`BaseMetastoreTableOperations` as opposed to one of the other implementations 
of `TableOperations` such as `HadoopTableOperations`? How can I determine when 
a specific `TableOperations` is being used? Additionally, as this class is 
marked as abstract but yet I can't find anything that extends / implements it, 
why do we have it? I also see that there is a python version of this class, so 
is it possible that the python version of the class is why this particular 
abstract class exists?
   
   Additionally, In the `log.debug` case you're adding, I imagine that it is 
likely still necessary to set the current metadata and related fields to null / 
their empty values. However, do we still intend to rethrow the exception like 
it currently is done? Looking through the code, I think it would possibly break 
things if we don't rethrow, such as 
[BaseTransaction#commitReplaceTransaction(boolean 
orCreate)](https://github.com/apache/iceberg/blob/454101c3573acb9cd94d6d9a306ed99a5a324ed9/core/src/main/java/org/apache/iceberg/BaseTransaction.java#L273)
 in the case that we're expecting the table to already exist (e.g. when the 
transaction is not allowed to create the table, that is if the transaction is 
not of type `CREATE_OR_REPLACE_TABLE` when calling `commitReplaceTransaction`).
   
   So I have a few questions:
   1) Why do we have this abstract base class `BaseMetastoreTableOperations` if 
nothing implements / extends it? I've mentioned a few reasons above, 
particularly the corresponding python class which is somewhat more complete.
   2) How can we be sure when this version of `TableOperations` is in play vs 
any other? I'm assuming that when we're connected to an actual Hive Metastore 
(vs using Hadoop filesystem and table properties for the metadata), then this 
class comes into play, but since it's abstract and not extended I fail to see 
how that's possible.
   3) The current behavior is to rethrow the exception. In the case that we are 
using the debug log, do we know that the upstream callers will catch this and 
not additionally add their own logs of a higher level and/or are we concerned 
that we are rethrowing the `NoSuchIcebergTableException`?
   
   Please let me know if you need me to clean up / reexplain some of my 
questioning, as I know it's somewhat convoluted. I'm doing my best to 
understand when this class comes into play and how but a little bit of 
assistance from somebody who knows much more than I do (or a link to possible 
documentation about it) would really help me out. Thanks for all of the work 
you put in @HeartSaVioR!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to