[GitHub] [incubator-hudi] adamjoneill opened a new issue #1324: Presto - select * from table does not work

GitBox Tue, 11 Feb 2020 13:05:27 -0800

adamjoneill opened a new issue #1324: Presto - select * from table does not work
URL: https://github.com/apache/incubator-hudi/issues/1324
 
 
   **_Tips before filing an issue_**
   
   - Have you gone through our 
[FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   I have a parquet record created with hudi off a spark kinesis stream and 
stored in S3.
   
   An AWS glue table is generated from this record. I update the InputRecord 
type to org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat as per 
instructions 
https://cwiki.apache.org/confluence/display/HUDI/Migration+Guide+From+com.uber.hoodie+to+org.apache.hudi
   
   From the presto-cli i run
   
   ```
   presto-cli --catalog hive --schema my-schema --server my-server:8889
   presto:my-schema> select * from table
   ```
   
   this returns
   
   ```
   Query 20200211_185222_00050_hej8h, FAILED, 1 node
   Splits: 17 total, 0 done (0.00%)
   0:01 [0 rows, 0B] [0 rows/s, 0B/s]
   
   Query 20200211_185222_00050_hej8h failed: No value present
   ```
   
   however when i run
   
   ```
   select id from table
   ```
   
   it returns
   
   ```
       id    
   ----------
    34551832 
   (1 row)
   
   Query 20200211_185250_00051_hej8h, FINISHED, 1 node
   Splits: 17 total, 17 done (100.00%)
   0:00 [1 rows, 93B] [2 rows/s, 213B/s]
   
   ```
   is this expected behaviour? or is there an underlying issue with the setup 
between Hudi/AWS Glue/Presto
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create spark job that reads from kinesis stream
   2. Save record to S3 using hudi
   3. AWS glue job catalogs directory
   4. Using presto-cli query database created by AWS Glue
   
   **Expected behavior**
   
   All rows to be returned. Similar to when querying a parquet record using 
spark without hudi.
   
   **Environment Description**
   
   * Hudi version : hudi-spark-bundle:0.5.0-incubating (with 
spark-avro_2.11:2.4.4)
   
   * Spark version : 2.4.4
   
   * Hive version :Hive 2.3.6
   
   * Hadoop version : Hadoop distribution:Amazon 2.8.5
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] adamjoneill opened a new issue #1324: Presto - select * from table does not work

Reply via email to