[GitHub] [incubator-hudi] adamjoneill commented on issue #1325: presto - querying nested object in parquet file created by hudi

GitBox Thu, 13 Feb 2020 11:32:27 -0800

adamjoneill commented on issue #1325: presto - querying nested object in 
parquet file created by hudi
URL: https://github.com/apache/incubator-hudi/issues/1325#issuecomment-585932919
 
 
   @vinothchandar from my investigation above it would suggest it to be how 
hudi writes parquet data. 
   
   Whilst limited in its scope, and many moving parts, my investigation 
involved 
   
   1. taking a record that includes an array of complex objects (no primitive 
or "simple" types belong to the array item object) off the kinesis stream
   2. saving it to parquet in S3 using the dataFrame API
   3. then using the same record, save it using hudi to S3 
   4. AWS Glue crawls over these files, creates database and tables
   5. presto query `select * from table` against hudi parquet file fails
   6. presto query `select * from table` spark api file succeeds
   
   I agree it does seem strange and the stack trace does point to a presto 
issue with reading the array. Unfortunately I'm not 100% across the project to 
know where to begin debugging the issue. What can I do to find out further?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] adamjoneill commented on issue #1325: presto - querying nested object in parquet file created by hudi

Reply via email to