Within in a CI/CD pipeline I use MiniDFSCluster and MiniYarnCluster if the production cluster has also HDFS and Yarn - it has been proven as extremely useful and caught a lot of errors before going to the cluster (ie saves a lot of money).
Cf. https://wiki.apache.org/hadoop/HowToDevelopUnitTests Works fine. > On 13. Nov 2017, at 04:36, trs...@gmail.com wrote: > > @Jörn Spark without Hadoop is useful > For using sparks programming model on a single beefy instance > For testing and integrating with a CI/CD pipeline. > It's ugly to have tests which depend on a cluster running somewhere. > > >> On Sun, 12 Nov 2017 at 17:17 Jörn Franke <jornfra...@gmail.com> wrote: >> Why do you even mind? >> >> > On 11. Nov 2017, at 18:42, Cristian Lorenzetto >> > <cristian.lorenze...@gmail.com> wrote: >> > >> > Considering the case i neednt hdfs, it there a way for removing completely >> > hadoop from spark? >> > Is YARN the unique dependency in spark? >> > is there no java or scala (jdk langs)YARN-like lib to embed in a project >> > instead to call external servers? >> > YARN lib is difficult to customize? >> > >> > I made different questions for understanding what is the better way for me >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>