duc-dn opened a new issue, #7683:
URL: https://github.com/apache/hudi/issues/7683

   **Describe the problem you faced**
   
   I am using hudi kafka connect to consume data from topic on Kafka, I save 
data (hudi table) on minio. Besides, I synced hudi table on minio with hive 
metastore. After I use trino to query data and try to count records of hudi 
table but it returns only the number of hudi_table in the latest commit without 
returning all records of hudi table.
   
   **To Reproduce**
   - Steps to reproduce the behavior:
   1. send data to topic kafka
   2. add the Hudi Sink to the Connector (when commit, it synced with hive 
metastore)
   3. use trino to query data
   - This is my config-sink file:
   ```
   {
       "name": "hudi-sink-demo",
       "config": {
                "bootstrap.servers": "broker:9092",
                "connector.class": 
"org.apache.hudi.connect.HoodieSinkConnector",
                "tasks.max": "4",
                "control.topic.name": "hudi-control-topic",
                "key.converter": 
"org.apache.kafka.connect.storage.StringConverter",
                "value.converter": 
"org.apache.kafka.connect.storage.StringConverter",
                "value.converter.schemas.enable": "false",
                "topics": "ux_click_demo",
                "hoodie.table.name": "ux_click",
                "hoodie.table.type": "COPY_ON_WRITE",
                "hoodie.base.path": "s3a://datalake/ux_click",
                "hoodie.datasource.write.recordkey.field": "_id",
                "hoodie.datasource.write.partitionpath.field": 
"screen_size_type",
                "hoodie.datasource.write.keygenerator.type":"COMPLEX",
                "hoodie.compact.inline.max.delta.commits":2,    
                "fs.s3a.fast.upload": "true",   
                "fs.s3a.access.key": "minioadmin",
                "fs.s3a.secret.key": "minioadmin",
                "fs.s3a.path.style.access": "true",
                "fs.s3a.endpoint": "http://minio:9000";,
                "hoodie.schemaprovider.class": 
"org.apache.hudi.schema.SchemaRegistryProvider",
                "hoodie.deltastreamer.schemaprovider.registry.url": 
"http://schema-registry:8081/subjects/ux_click/versions/latest";,
                "hoodie.kafka.commit.interval.secs": 180,
                "hoodie.metadata.enable": "true",
                "hoodie.metadata.validate": "true",
                "hoodie.meta.sync.enable": "true",
                "hoodie.meta.sync.classes": "org.apache.hudi.hive.HiveSyncTool",
                "hoodie.datasource.hive_sync.table": "ux_click",
                "hoodie.datasource.hive_sync.partition_fields": 
"screen_size_type",                             
                "hoodie.datasource.hive_sync.partition_extractor_class": 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
                "hoodie.datasource.hive_sync.use_jdbc": "false",
                "hoodie.datasource.hive_sync.mode": "hms",
                "hive.metastore.uris": "thrift://hivemetastore:9083",
                "hive.metastore.client.socket.timeout": "1500s"
         }
   }
   ```
   **Expected behavior**
   - I want to query all records on ux_click table use trino.
   
   **Environment Description**
   * Hudi version : 0.12
   * Hive version : 2.3.2
   * Storage (HDFS/S3/GCS..) : S3
   * Running on Docker? (yes/no) : yes
   * Trino version: 351
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to