leosanqing opened a new issue, #9687:
URL: https://github.com/apache/hudi/issues/9687

   
   
   **Describe the problem you faced**
   
   I know that rt and ro have different table responsibilities.
   Hive query rt table is incremental query and snapshot query(log+parquet). ro 
table is read optimization query(Only parquet file)
   
   In fact, from a certain semantic point of view, the rt table can completely 
do what the ro table does. When directly querying the rt table, filtering the 
log file can do the same thing as querying the ro table.
   
   In fact, I tried it and added a config to the 
HoodieParquetRealtimeInputFormat#getRecordReader method. The function is to 
determine whether to load the HoodieRealtimeRecordReader (read log+parquet) or 
load and call the super class method to obtain it directly (only read the 
parquet file).
   
   It is theoretically possible, so why should we keep the ro table?
   
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.13
   
   * Spark version :
   
   * Hive version : 3.2
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : no
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to