[ 
https://issues.apache.org/jira/browse/HAMA-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253251#comment-13253251
 ] 

Thomas Jungblut commented on HAMA-500:
--------------------------------------

Hi Mikalai,

I answer your questions first before my massive feedback:

bq. 1) Is this algorithm is something you have expected?

For me this is enough.

bq. 2) Have I choose the right command line format?

Yes, it's better than what we've used before ;)

bq. 3) Is exception handling appropriate for Hama?

Yeah, but it would be cool if you can output the given value as well as the 
error message if you throw IllegalArgumentExceptions. 

bq. 4) Is documentation style appropriate for Hama?

Yes, it is much more than we usually document, however there are a few mistakes 
I explain you in the following paragraphs.

@author tags are discouraged at ASF, it's about collective code ownership and 
to secure you from getting death threads when you introduce bugs. 
And empty @param or @return tags are not so good as well, either fill them or 
remove them.

You are using a lot of static configuration values, they won't be distributed 
in a clustering environment.
Hama is distributing the jar with the class and instantiates it on each host 
reflecti-o-magically. So all the numbers will stay default even if you set them 
in the main method (rows and columns for example). 
Therefore you have the configuration (HamaConfiguration conf in your code). You 
can set values for rows and columns like this:

{noformat}
   conf.setInt("matrix.rows", 10);
   conf.setInt("matrix.columns", 10);
{noformat}

and retrieve it via

{noformat}
   // 10 is default here if the matrix.rows property wasn't set anywhere
   int rows = conf.getInt("matrix.rows", 10);
   int cols = conf.getInt("matrix.columns", 10);
{noformat}

The conf will be serialized to XML and distributed with the jar and read once a 
task starts.

Then I would recommend you to check for illegal values in the main method or 
when you parse the args instead of setters for static variables. 

And the output should be the matrix itself in a format which can be used for 
further processing, e.G. the Sparse-Matrix-Vector Multiplication. So no text 
about the statistics, therefore you can use the Counters in the BSPPeer class.

General Note and not really related to your code, but I guess we should use a 
math lib to have sparse vectors and stuff (e.G. mahout-math). They have better 
HashMap implementations that are designed for primitives (int/double pairs). 

But the algorithm itself seems fine, I have not tested it yet though. 

Good amount of work! Would you please fix my suggestions and upload a new 
patch? (name it HAMA-500_1.patch or so). Thanks!

                
> Add random matrix generator.
> ----------------------------
>
>                 Key: HAMA-500
>                 URL: https://issues.apache.org/jira/browse/HAMA-500
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp, examples
>            Reporter: Edward J. Yoon
>            Assignee: Mikalai Parafeniuk
>              Labels: newbie
>         Attachments: HAMA-500.patch
>
>
> In this issue, we add random matrix generator to produce large data sets that 
> can be used for graph or matrix examples. This can be implemented using BSP 
> job.
> It'd be nice if sparsity, symmetry, and size can be configurable via the 
> command line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to