[Supporting Hadoop data and cluster management] weekly update

Efi Sun, 19 Jul 2015 07:57:06 -0700

Hello everyone,

A lot for this week,first of all we had a discussion with Steven,Prestonand Ian about the YARN implementation.Ian referenced 3 projects thathelp you use YARN cluster management with your application, these areSlider,Twill and Kitten.Slider seems to be the most promising and stableof the three so this is the one I started learning and testing first.Wediscussed the limitations and the benefits and we delieve it better touse one of them instead of writing our own classes that will useYARN,like Apache Flink does.As I said I started testing them and ofcourse if they do not work out the way we want it I will work oncreating a YARN connector for VXQuery as planned at first.

In regards to the parallel reading from HDFS, Preston explained to me alot about the collection rules and how to implement the rule forCollectionWithTag which is actually what the user will include in hisquery if he wants the parallelization of the data from HDFS.The rule isalmost ready and needs mostly testing,especially with a real distributedcluster.For these tests I set up a distributed HDFS cluster of 3 VMs,one master and two slaves and I will run the tests on them this week.


Any insight and thoughts on these subjects is more than welcome!

Cheers,
Efi

[Supporting Hadoop data and cluster management] weekly update

Reply via email to