Hello everyone,

A lot for this week,first of all we had a discussion with Steven,Preston and Ian about the YARN implementation.Ian referenced 3 projects that help you use YARN cluster management with your application, these are Slider,Twill and Kitten.Slider seems to be the most promising and stable of the three so this is the one I started learning and testing first.We discussed the limitations and the benefits and we delieve it better to use one of them instead of writing our own classes that will use YARN,like Apache Flink does.As I said I started testing them and of course if they do not work out the way we want it I will work on creating a YARN connector for VXQuery as planned at first.

In regards to the parallel reading from HDFS, Preston explained to me a lot about the collection rules and how to implement the rule for CollectionWithTag which is actually what the user will include in his query if he wants the parallelization of the data from HDFS.The rule is almost ready and needs mostly testing,especially with a real distributed cluster.For these tests I set up a distributed HDFS cluster of 3 VMs, one master and two slaves and I will run the tests on them this week.

Any insight and thoughts on these subjects is more than welcome!

Cheers,
Efi

Reply via email to