adamjoneill opened a new issue #1324: Presto - select * from table does not work URL: https://github.com/apache/incubator-hudi/issues/1324 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** I have a parquet record created with hudi off a spark kinesis stream and stored in S3. An AWS glue table is generated from this record. I update the InputRecord type to org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat as per instructions https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi From the presto-cli i run ``` presto-cli --catalog hive --schema my-schema --server my-server:8889 presto:my-schema> select * from table ``` this returns ``` Query 20200211_185222_00050_hej8h, FAILED, 1 node Splits: 17 total, 0 done (0.00%) 0:01 [0 rows, 0B] [0 rows/s, 0B/s] Query 20200211_185222_00050_hej8h failed: No value present ``` however when i run ``` select id from table ``` it returns ``` id ---------- 34551832 (1 row) Query 20200211_185250_00051_hej8h, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [1 rows, 93B] [2 rows/s, 213B/s] ``` is this expected behaviour? or is there an underlying issue with the setup between Hudi/AWS Glue/Presto **To Reproduce** Steps to reproduce the behavior: 1. Create spark job that reads from kinesis stream 2. Save record to S3 using hudi 3. AWS glue job catalogs directory 4. Using presto-cli query database created by AWS Glue **Expected behavior** All rows to be returned. Similar to when querying a parquet record using spark without hudi. **Environment Description** * Hudi version : hudi-spark-bundle:0.5.0-incubating (with spark-avro_2.11:2.4.4) * Spark version : 2.4.4 * Hive version :Hive 2.3.6 * Hadoop version : Hadoop distribution:Amazon 2.8.5 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** **Stacktrace** ```Add the stacktrace of the error.```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
