Ambarish-Giri commented on issue #3395:
URL: https://github.com/apache/hudi/issues/3395#issuecomment-893140263


   Sure @nsivabalan eventually our test and prod environment will be EMR only. 
But before doing actual testing and derive the benchmarking metrics as I said 
earlier just evaluating Hudi to explore all its features in my local setup.
    
   But for now below are the libraries I am using :   
   libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.7"
   libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.7"
   libraryDependencies += "org.apache.hudi" %% "hudi-spark-bundle" % "0.7.0"
   libraryDependencies += "org.apache.hudi" %% "hudi-utilities-bundle" % "0.7.0"
   libraryDependencies += "org.apache.spark" %% "spark-avro" % "2.4.7"
   
   
   and while creating Spark session Object below is the spark config settings:
   val spark: SparkSession = SparkSession.builder()
         .appName("hudi-datalake")
         .master("local[*]")
         .config("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
         .config("spark.shuffle.compress", "true")
         .config("spark.shuffle.spill.compress", "true")
         .config("spark.executor.extraJavaOptions", "-XX:+UseG1GC")
         .config("spark.sql.hive.convertMetastoreParquet", "false") 
         .getOrCreate()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to