[
https://issues.apache.org/jira/browse/HAMA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282226#comment-13282226
]
Thomas Jungblut commented on HAMA-580:
--------------------------------------
You understand how it works?
{noformat}
Configuration conf = new Configuration();
VertexWritable.CONFIGURATION = conf;
VertexArrayWritable.CONFIGURATION = conf;
VertexWritable.VERTEX_ID_CLASS = Text.class;
VertexWritable.VERTEX_VALUE_CLASS = IntWritable.class;
VertexArrayWritable.EDGE_ID_CLASS = Text.class;
VertexArrayWritable.EDGE_VALUE_CLASS = IntWritable.class;
{noformat}
bq.Should we use template again?
The big-file issues arise from templates, mainly because you have to write the
classnames for every given part of a vertex.
So you have x-thousand times the same classnames as UTF8 string in the file.
Which is not needed, because the classes should be constant for each vertex and
known at clientside as well as on job side.
The static setting is not a very good solution, but saves soo much space
compared to fields.
> Improve input of graph module
> -----------------------------
>
> Key: HAMA-580
> URL: https://issues.apache.org/jira/browse/HAMA-580
> Project: Hama
> Issue Type: Improvement
> Components: graph
> Affects Versions: 0.5.0
> Reporter: Thomas Jungblut
> Assignee: Thomas Jungblut
> Fix For: 0.5.0
>
> Attachments: HAMA-580.patch, HAMA-580_1.patch
>
>
> Currently it is too verbose, the wikipedia dataset is going to be bloated
> from 0.95gb to 5gb just because it is writing the classes x-times.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira