Hello everyone,

The update for this week consists of two parts,the CollectionWithTagRule that is about reading the blocks from HDFS using the XMLInputFormat class.This rule informs the parser that it needs to read its data in blocks from HDFS and passes some additional information that are needed in order to read the items correctly.I made one change in the XMLInputFormat class, the class reads a block from HDFS and looks for the starting and closing tag that the user specified in his query.Until now I did not take into account that in the opening tag there may be more information refarding the item, for example:
<book name="something">
...
...
</book>

but I was only looking for tags like:
<book>
...
...
</book>

I changed that to take into account that the opening tag may contain additional information and to include it in the returning item.

The second part of the update is about the YARN applications, slider and twill that I tested this week and my conclusions about which can be used with vxquery better. - Slider: Requires mostly configuration files and python scripts for the application to work which I find very good and generic because with little changes to the configuration you can use the same work in similar projects. - Twill: Requires zookeeper installed along with YARN in order to work.This application needs mostly changes in the code of the project you want to use with Twill.

Based on these I find slider, yet again, a better candidate.Still if anyone has more experience with any of these systems I would like to give me some feedback on my observations and of course which one is best.

Thank you,
Efi

Reply via email to