pan3793 commented on issue #5317:
URL: https://github.com/apache/kyuubi/issues/5317#issuecomment-1727717727

   I basically figured out what happened, the key point is that 
`HivePartitionReaderFactory` holds the un-serializable Hive's Table and 
Partition objects.
   
   ```
   case class HivePartitionReaderFactory(
       ...,
       hiveTable: HiveTable,
       ...
       partFileToHivePart: Map[PartitionedFile, HivePartition],
       ...)
     extends PartitionReaderFactory with Logging {
     ...
   }
   ```
   
   `PartitionReaderFactory` is constructed by the Driver and should be 
serialized and sent to the Executor for constructing `PartitionReader`. After 
basic thoughts, I think there are two approaches to address this issue:
   
   1. let `HivePartitionReaderFactory` hold the catalyst's `Table` and 
`Partition` instances instead of Hive's, and then convert the catalyst's 
instances to Hive's on the Executor side.
   
   2. erase the cached fields of Hive's `Table` and `Partition` instances 
before using them to construct `HivePartitionReaderFactory`. (Actually, I don't 
understand why Hive did not mark those fields as `transient`)
   ```
   package org.apache.hadoop.hive.ql.metadata;
   ...
   
   public class Table implements Serializable {
     ...
     /**
      * These fields are all cached fields.  The information comes from tTable.
      */
     private Deserializer deserializer;
     private Class<? extends OutputFormat> outputFormatClass;
     private Class<? extends InputFormat> inputFormatClass;
     private Path path;
     ...
   }
   
   public class Partition implements Serializable {
     ...
     /**
      * These fields are cached. The information comes from tPartition.
      */
     private Deserializer deserializer;
     private Class<? extends OutputFormat> outputFormatClass;
     private Class<? extends InputFormat> inputFormatClass;
     ...
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to