[I] [SUPPORT]Spark reads from all partitions when using composite keys with a record index. [hudi]

via GitHub Wed, 23 Oct 2024 09:06:15 -0700


RameshkumarChikoti123 opened a new issue, #12152:
URL: https://github.com/apache/hudi/issues/12152


   Added two record  keys(customer_id,name) and configured Record index as below
   
   `hudi_options = {
       'hoodie.table.name': "hudi-table-with-rli-two-record-keys",
       'hoodie.datasource.write.recordkey.field': "customer_id,name",
       'hoodie.datasource.write.partitionpath.field': "state",
       'hoodie.datasource.write.precombine.field': "created_at",
       'hoodie.datasource.write.operation': "upsert",  # Use upsert operation
       'hoodie.index.type': "RECORD_INDEX",
       'hoodie.metadata.enable': "true",
       'hoodie.metadata.index.column.stats.enable': "true",
       'hoodie.metadata.record.index.enable': "true"
   }
   
df.write.format("hudi").options(**hudi_options).mode("append").save("s3a://bucket/var/proj/hudipoc-proj/hudi-table-with-rli-two-record-key/")`
   
   **Reading record with composite keys**
   
   ` spark.read.format("hudi") \
       .option("hoodie.enable.data.skipping", "true") \
       .option("hoodie.metadata.enable", "true") \
       .option("hoodie.metadata.record.index.enable", "true") \
       .option("hoodie.metadata.index.column.stats.enable", "true") \
       
.load("s3a://bucket/var/proj/hudipoc-proj/hudi-table-with-rli-two-record-key/") 
\
       .createOrReplaceTempView("hudi_snapshot1")
   spark.sql("select * from hudi_snapshot1 where 
customer_id='04da8419-fb9e-47f1-a44f-3cf2199ad20a'and name='Customer_43680' 
").show(truncate=False)`
   
   
   **Observations**:
   Spark is reading from all the partition  as show in attached  image
   
   
![image](https://github.com/user-attachments/assets/3faee159-9157-43c3-bcee-551b2cb85ec1)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT]Spark reads from all partitions when using composite keys with a record index. [hudi]

Reply via email to