Hello everyone,
The update for this week consists of two parts,the CollectionWithTagRule
that is about reading the blocks from HDFS using the XMLInputFormat
class.This rule informs the parser that it needs to read its data in
blocks from HDFS and passes some additional information that are needed
in order to read the items correctly.I made one change in the
XMLInputFormat class, the class reads a block from HDFS and looks for
the starting and closing tag that the user specified in his query.Until
now I did not take into account that in the opening tag there may be
more information refarding the item, for example:
<book name="something">
...
...
</book>
but I was only looking for tags like:
<book>
...
...
</book>
I changed that to take into account that the opening tag may contain
additional information and to include it in the returning item.
The second part of the update is about the YARN applications, slider and
twill that I tested this week and my conclusions about which can be used
with vxquery better.
- Slider: Requires mostly configuration files and python scripts for
the application to work which I find very good and generic because with
little changes to the configuration you can use the same work in similar
projects.
- Twill: Requires zookeeper installed along with YARN in order to
work.This application needs mostly changes in the code of the project
you want to use with Twill.
Based on these I find slider, yet again, a better candidate.Still if
anyone has more experience with any of these systems I would like to
give me some feedback on my observations and of course which one is best.
Thank you,
Efi