Hello everyone,
A lot for this week,first of all we had a discussion with Steven,Preston
and Ian about the YARN implementation.Ian referenced 3 projects that
help you use YARN cluster management with your application, these are
Slider,Twill and Kitten.Slider seems to be the most promising and stable
of the three so this is the one I started learning and testing first.We
discussed the limitations and the benefits and we delieve it better to
use one of them instead of writing our own classes that will use
YARN,like Apache Flink does.As I said I started testing them and of
course if they do not work out the way we want it I will work on
creating a YARN connector for VXQuery as planned at first.
In regards to the parallel reading from HDFS, Preston explained to me a
lot about the collection rules and how to implement the rule for
CollectionWithTag which is actually what the user will include in his
query if he wants the parallelization of the data from HDFS.The rule is
almost ready and needs mostly testing,especially with a real distributed
cluster.For these tests I set up a distributed HDFS cluster of 3 VMs,
one master and two slaves and I will run the tests on them this week.
Any insight and thoughts on these subjects is more than welcome!
Cheers,
Efi