Within in a CI/CD pipeline I use MiniDFSCluster and MiniYarnCluster if the 
production cluster has also HDFS and Yarn - it has been proven as extremely 
useful and caught a lot of errors before going to the cluster (ie saves a lot 
of money).

Cf. https://wiki.apache.org/hadoop/HowToDevelopUnitTests

Works fine.

> On 13. Nov 2017, at 04:36, trs...@gmail.com wrote:
> 
> @Jörn Spark without Hadoop is useful
> For using sparks programming model on a single beefy instance
> For testing and integrating with a CI/CD pipeline.
> It's ugly to have tests which depend on a cluster running somewhere.
> 
> 
>> On Sun, 12 Nov 2017 at 17:17 Jörn Franke <jornfra...@gmail.com> wrote:
>> Why do you even mind?
>> 
>> > On 11. Nov 2017, at 18:42, Cristian Lorenzetto 
>> > <cristian.lorenze...@gmail.com> wrote:
>> >
>> > Considering the case i neednt hdfs, it there a way for removing completely 
>> > hadoop from spark?
>> > Is YARN the unique dependency in spark?
>> > is there no java or scala (jdk langs)YARN-like lib to embed in a project 
>> > instead to call external servers?
>> > YARN lib is difficult to customize?
>> >
>> > I made different questions for understanding what is the better way for me
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> 

Reply via email to