[GitHub] [hudi] rubenssoto commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

GitBox Thu, 20 Aug 2020 05:55:33 -0700


rubenssoto commented on issue #1981:
URL: https://github.com/apache/hudi/issues/1981#issuecomment-677647272



   Yeah, I could try.
   
   I made some tests, the smaller table was partitioned by day, so now I 
partitioned by year-month, so now I have greater files...my simple count 
improve a lot before was taking 1 minute and 30 seconds, now 17 seconds, but 
count on bigger table takes only 7 seconds.
   
   I could try on EMR but I catch this error
   
   Query 20200820_125020_00004_h9eb5 failed: Not valid Parquet file: 
s3://datalake/raw/courier_api/demand_coverage/created_year_month_brt=2020-06-01/b89ad14e-8cf2-446b-934a-b27107e88e20-0_26-8-4880_20200819200116.parquet
 expected magic number: [80, 65, 82, 49] got: [51, -66, -112, 88] 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] rubenssoto commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

Reply via email to