Re: [Supporting Hadoop data and cluster management] weekly update

Efi Sat, 04 Jul 2015 08:03:06 -0700

Hello everyone,

This week's update is about the changes that I mentioned in my lastupdate.The JUnit test is not completed yet,I am using a MiniDFSClusterimplementation for the tests but I havent managed to get it to workcorrectly yet.I believe the problems are trivial and have not reportedthem in the ticket so far.I will create a ticket if I continue toreceive the same errors.

About the input splits, I have implemented a scheduler that mapswhich split should be processed by which node according to the split'slocation and the number of splits - nodes.I need to test this as wellbefore I commit it.

Thats all for this week.

Best regards,
Efi

On 25/06/2015 07:30 μμ, Efi wrote:

Thank you Eldon, that's was very helpful and I had completelyoverlooked it when I first setup up my eclipse for vxquery.
This week I continue working on reading blocks from HDFS, I used someof the hyracks-hdfs-core classes and methods and I was able to get thesplits of input files from HDFS without having to use a Map function.Iwill continue working on how to distribute and read correctly thesplits between the nodes of the vxquery cluster.
I will also do some changes to the JUnit tests for HDFS.They willstart a temporary dfs cluster in order to run the tests instead ofjust failing when the user does not have an HDFS cluster.
Cheers,
Efi

On 16/06/2015 08:42 μμ, Eldon Carman wrote:
Looks good. One quick comment, take a look at our code format and style
guidelines. You can set up eclipse to format your code for you using our
sister project's code format profile [1].

[1] http://vxquery.apache.org/development_eclipse_setup.html
On Sat, Jun 13, 2015 at 11:03 AM, Michael Carey <mjca...@ics.uci.edu>wrote:
Very cool!!


On 6/13/15 9:38 AM, Efi wrote:
Hello everyone,
The reading of a single document and a collection of documents fromHDFSis completed and tested.New JUnit tests are added in the xtestproject,they are just copies of the aggregate tests, that I changed a bitto run
for the collection reading from HDFS.

I added another option in the xtest in order for the HDFS tests to run
successfully.It is a boolean option called /hdfs/ and it enablesthe tests
for HDFS to run.

You can view these in the branch /hdfs2_read/ in my github fork of
vxquery. [1]

I will continue with the parallel reading from HDFS.

Best Regards,
Efi

[1] https://github.com/efikalti/vxquery/tree/hdfs2_read

On 04/06/2015 08:50 μμ, Eldon Carman wrote:
We have a set of JUnit tests to validate VXQuery. I think it wouldbe agood idea to add test cases that validate the HDFS code youradding to
the
code base. Take a look at the vxquery-xtest sub-project. The VXQuery
Catalog holds all the vxquery test cases [1]. You could add a newHDFS
test
group to this list catalog.

1.
https://github.com/apache/vxquery/blob/master/vxquery-xtest/src/test/resources/VXQueryCatalog.xml
On Thu, Jun 4, 2015 at 10:26 AM, Efi <efika...@gmail.com> wrote:

  Hello everyone,
This week Preston and Steven helped me with the vxquery code and
specifically where my parser and two more functionalities willfit in
the
code.
Along with the hdfs parallel parser that I have been working onthese
past
weeks,two more methods will be implemented.They will both read whole
files
from hdfs and not just blocks.The one will read all the fileslocated
in a
directory in hdfs and the other will read a single document.

The reading of files from a directory is completed and for the next
week I
will focus on testing it and implementing/testing the second method,
reading of a single document.

Best regards,
Efi

Re: [Supporting Hadoop data and cluster management] weekly update

Reply via email to