[GitHub] [hudi] umehrot2 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

GitBox Thu, 20 Aug 2020 19:21:38 -0700


umehrot2 commented on issue #1981:
URL: https://github.com/apache/hudi/issues/1981#issuecomment-678000764



   @rubenssoto until this is fixed would you been okay querying through 
`spark-sql` instead ?
   
   Since you are using COW, you can make your spark-sql queries use spark's 
listing mechanism and just pass the Hoodie path filter to it. I think this is 
going to give you better query performance. Here is how you should start 
`spark-sql`:
   
   ```
   spark-sql --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.hadoop.mapreduce.input.pathFilter.class=org.apache.hudi.hadoop.HoodieROTablePathFilter"
 --jars 
/usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/spark/external/lib/spark-avro.jar
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] umehrot2 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

Reply via email to