[#131]Supporting Hadoop data and cluster management

Efi Sun, 17 May 2015 13:16:35 -0700

Hello everyone,

This is my update on what I have been doing this last week:

Created an XMLInputFormat java class with the functionalities that Hamzadescribed in the issue [1] .The class reads from blocks located in HDFSand returns complete items according to a specified xml tag.I also tested this class in a standalone hadoop cluster with xml filesof various sizes, the smallest being a single file of 400 MB and thelargest a collection of 5 files totalling 6.1 GB.

This week I will create another implementation of the XMLInputFormatwith a different way of reading and delivering files, the way Idescribed in the same issue and I will test both solutions in astandalone and a small hadoop cluster (5-6 nodes).

You can see this week's results here [2] .I will keep updating this fileabout the other tests.


Best regards,
Efi

[1] https://issues.apache.org/jira/browse/VXQUERY-131

[2]https://docs.google.com/spreadsheets/d/1kyIPR7izNMbU8ctIe34rguElaoYiWQmJpAwDb0t9MCw/edit?usp=sharing

[#131]Supporting Hadoop data and cluster management

Reply via email to