[
https://issues.apache.org/jira/browse/HAMA-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253251#comment-13253251
]
Thomas Jungblut commented on HAMA-500:
--------------------------------------
Hi Mikalai,
I answer your questions first before my massive feedback:
bq. 1) Is this algorithm is something you have expected?
For me this is enough.
bq. 2) Have I choose the right command line format?
Yes, it's better than what we've used before ;)
bq. 3) Is exception handling appropriate for Hama?
Yeah, but it would be cool if you can output the given value as well as the
error message if you throw IllegalArgumentExceptions.
bq. 4) Is documentation style appropriate for Hama?
Yes, it is much more than we usually document, however there are a few mistakes
I explain you in the following paragraphs.
@author tags are discouraged at ASF, it's about collective code ownership and
to secure you from getting death threads when you introduce bugs.
And empty @param or @return tags are not so good as well, either fill them or
remove them.
You are using a lot of static configuration values, they won't be distributed
in a clustering environment.
Hama is distributing the jar with the class and instantiates it on each host
reflecti-o-magically. So all the numbers will stay default even if you set them
in the main method (rows and columns for example).
Therefore you have the configuration (HamaConfiguration conf in your code). You
can set values for rows and columns like this:
{noformat}
conf.setInt("matrix.rows", 10);
conf.setInt("matrix.columns", 10);
{noformat}
and retrieve it via
{noformat}
// 10 is default here if the matrix.rows property wasn't set anywhere
int rows = conf.getInt("matrix.rows", 10);
int cols = conf.getInt("matrix.columns", 10);
{noformat}
The conf will be serialized to XML and distributed with the jar and read once a
task starts.
Then I would recommend you to check for illegal values in the main method or
when you parse the args instead of setters for static variables.
And the output should be the matrix itself in a format which can be used for
further processing, e.G. the Sparse-Matrix-Vector Multiplication. So no text
about the statistics, therefore you can use the Counters in the BSPPeer class.
General Note and not really related to your code, but I guess we should use a
math lib to have sparse vectors and stuff (e.G. mahout-math). They have better
HashMap implementations that are designed for primitives (int/double pairs).
But the algorithm itself seems fine, I have not tested it yet though.
Good amount of work! Would you please fix my suggestions and upload a new
patch? (name it HAMA-500_1.patch or so). Thanks!
> Add random matrix generator.
> ----------------------------
>
> Key: HAMA-500
> URL: https://issues.apache.org/jira/browse/HAMA-500
> Project: Hama
> Issue Type: New Feature
> Components: bsp, examples
> Reporter: Edward J. Yoon
> Assignee: Mikalai Parafeniuk
> Labels: newbie
> Attachments: HAMA-500.patch
>
>
> In this issue, we add random matrix generator to produce large data sets that
> can be used for graph or matrix examples. This can be implemented using BSP
> job.
> It'd be nice if sparsity, symmetry, and size can be configurable via the
> command line.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira