Hans-Raintree opened a new issue, #9426:
URL: https://github.com/apache/hudi/issues/9426

   **Describe the problem you faced**
   
   When reading cdc logs the partition path is included in the before/after 
columns, but not as a top level column, so I can't filter to specific 
partitions before reading data and this makes reading data more costly.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  Write a table with
       'hoodie.table.cdc.enabled': 'true',
       'hoodie.table.cdc.supplemental.logging.mode': 'data_before_after'
       and partitioning enabled.
   2. Read the cdc logs with:
           'hoodie.datasource.query.type': 'incremental',
           'hoodie.datasource.read.begin.instanttime': begin_time,
           'hoodie.datasource.query.incremental.format': 'cdc'
   3. The result only has op, ts_ms, before and after columns.
   
   **Expected behavior**
   
   Partition path as separate columns on the top level.
   
   **Environment Description**
   
   * Hudi version : 0.13.1
   
   * Spark version : 3.3.2 / 3.4.0
   
   * Hive version : 3.1.3
   
   * Hadoop version : 2.7 / 3.3.3
   
   * Storage (HDFS/S3/GCS..) : S3 / Local
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Happens both in AWS EMR 6.12.0 and when running locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to