[GitHub] [hudi] Ambarish-Giri commented on issue #3395: [SUPPORT] Issues with read optimized query MOR table

GitBox Thu, 05 Aug 2021 04:36:49 -0700


Ambarish-Giri commented on issue #3395:
URL: https://github.com/apache/hudi/issues/3395#issuecomment-893140263



   Sure @nsivabalan eventually our test and prod environment will be EMR only. 
But before doing actual testing and derive the benchmarking metrics as I said 
earlier just evaluating Hudi to explore all its features in my local setup.
    
   But for now below are the libraries I am using :   
   libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.7"
   libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.7"
   libraryDependencies += "org.apache.hudi" %% "hudi-spark-bundle" % "0.7.0"
   libraryDependencies += "org.apache.hudi" %% "hudi-utilities-bundle" % "0.7.0"
   libraryDependencies += "org.apache.spark" %% "spark-avro" % "2.4.7"
   
   
   and while creating Spark session Object below is the spark config settings:
   val spark: SparkSession = SparkSession.builder()
         .appName("hudi-datalake")
         .master("local[*]")
         .config("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
         .config("spark.shuffle.compress", "true")
         .config("spark.shuffle.spill.compress", "true")
         .config("spark.executor.extraJavaOptions", "-XX:+UseG1GC")
         .config("spark.sql.hive.convertMetastoreParquet", "false") 
         .getOrCreate()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Ambarish-Giri commented on issue #3395: [SUPPORT] Issues with read optimized query MOR table

Reply via email to