adamjoneill commented on issue #1325: presto - querying nested object in parquet file created by hudi URL: https://github.com/apache/incubator-hudi/issues/1325#issuecomment-585932919 @vinothchandar from my investigation above it would suggest it to be how hudi writes parquet data. Whilst limited in its scope, and many moving parts, my investigation involved 1. taking a record that includes an array of complex objects (no primitive or "simple" types belong to the array item object) off the kinesis stream 2. saving it to parquet in S3 using the dataFrame API 3. then using the same record, save it using hudi to S3 4. AWS Glue crawls over these files, creates database and tables 5. presto query `select * from table` against hudi parquet file fails 6. presto query `select * from table` spark api file succeeds I agree it does seem strange and the stack trace does point to a presto issue with reading the array. Unfortunately I'm not 100% across the project to know where to begin debugging the issue. What can I do to find out further?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
