zhengyuan-cn opened a new issue, #6596: URL: https://github.com/apache/hudi/issues/6596
ENV: impala4.0+hive3.1.1 with hudi 0.12 via impala shell execute sql: select count(*) from tableName; return rows count is (195264946) less than actuall rows 217884008. but by spark SQL return 217884008 rows, is correct result . I refresh tableName mutl times then still uncorrect result. I replaced impala hudi dependency jar (hudi-common-0.5.0-incubating.jar, hudi-hadoop-mr-0.5.0-incubating.jar) with (hudi-common-0.12.0.jar, hudi-hadoop-mr-0.12.0.jar),issues still. ENV: impala4.0+hive3.1.1 with hudi 0.11 is correct. **Environment Description** * Hudi version : 0.12 * Spark version : spark-2.4.8 * Hive version : 3.1.1 (with impala comes with it ) * Hadoop version : hadoop-3.2.2 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no Additional : Impala: `[192.168.1.52:21000] hudi> refresh model_series_data_3; Connection lost, reconnecting... Opened TCP connection to 192.168.1.52:21000 Query: use `hudi` Query: refresh model_series_data_3 Query submitted at: 2022-09-05 07:07:44 (Coordinator: http://192.168.10.52:25000) Query progress can be monitored at: http://192.168.1.52:25000/query_plan?query_id=b34a6e2e71c0af91:2521ad2d00000000 Fetched 0 row(s) in 0.28s [192.168.1.52:21000] hudi> select count(*) from model_series_data_3; Query: select count(*) from model_series_data_3 Query submitted at: 2022-09-05 07:07:46 (Coordinator: http://192.168.10.52:25000) Query progress can be monitored at: http://192.168.1.52:25000/query_plan?query_id=f848080d361104ad:ebb3af9a00000000 +-----------+ | count(*) | +-----------+ | 195264946 | +-----------+ Fetched 1 row(s) in 2.72s` ================================================================== Spark : `+---------+ | count(1)| +---------+ |217884008| +---------+ 16:30:59,796 INFO AbstractConnector:381 - Stopped Spark@47da3952{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} 16:30:59,797 INFO SparkUI:54 - Stopped Spark web UI at http://192.168.2.56:4040` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
