eigenvalue of matrix

2014-02-24 Thread lujing zui
I search jira, and find eigenvalue algorithm has been implemented in HAMA.
but I cannot find any module about it in code.
where I can find it?


Re: Cutting a 0.7 release

2014-02-24 Thread Edward J. Yoon
That's huge diagram :-) Do you plan on work on HAMA-505, or create new one?

On Tue, Feb 25, 2014 at 1:33 PM, Chia-Hung Lin  wrote:
> Just let you know I may refactor based on the following diagram.
>
> http://people.apache.org/~chl501/diagram1.png
>
> That sketches the basic flow required for ft. I am currently evaluate
> related parts, so it's subjected to change.
>
>
>
>
>
>
> On 24 February 2014 20:52, Edward J. Yoon  wrote:
>> 0.6.4 or 0.7.0, Both are OK to me.
>>
>> Just FYI,
>>
>> The memory efficiency has been significantly (almost x2-3) improved by
>> runtime message serialization and compression. See
>> https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
>> (I'll attach more benchmarks and comparisons with other systems result
>> soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
>> SemiClustering, Graph's Combiners HAMA-857.
>>
>> According to my personal evaluations, current system is fairly
>> respectable. As I mentioned before, I believe we should stick to
>> in-memory style since the today's machines can be equipped with up to
>> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
>> must-have.
>>
>> Once we release this one, we finally might want to focus on below issues:
>>
>> * Fault tolerant job processing (checkpoint recovery)
>> * Support GPUs and InfiniBand
>>
>> Then, I think we can release version 1.0.
>>
>> On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
>>  wrote:
>>> Would you cut 0.7 or 0.6.4 ?
>>> I'd go with 0.6.4 as I think the next minor version change should be due to
>>> significant feature additions / changes and / or stability / scalability
>>> improvements.
>>>
>>> Regards,
>>> Tommaso
>>>
>>>
>>> 2014-02-24 8:47 GMT+01:00 Edward J. Yoon :
>>>
 Hi all,

 I plan on cutting a release next week. If you have some opinions, Pls feel
 free to comment here.

 Sent from my iPhone
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer, Inc.



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.


[jira] [Commented] (HAMA-867) HAMA 0.7 Snapshot cannot work with HDFS 2X due to lack of libs

2014-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911252#comment-13911252
 ] 

Hudson commented on HAMA-867:
-

SUCCESS: Integrated in Hama-trunk #279 (See 
[https://builds.apache.org/job/Hama-trunk/279/])
HAMA-867: change url to maven central repository (edwardyoon: rev 1571556)
* /hama/trunk/pom.xml


> HAMA 0.7 Snapshot cannot work with HDFS 2X due to lack of libs
> --
>
> Key: HAMA-867
> URL: https://issues.apache.org/jira/browse/HAMA-867
> Project: Hama
>  Issue Type: Bug
>  Components: build 
>Affects Versions: 0.7.0
> Environment: Fedora 17, hama 0.7, Hadoop 2.2.0, jdk 1.7
>Reporter: Skater Xu
>Assignee: Skater Xu
> Fix For: 0.7.0
>
> Attachments: HAMA-867.Patch, HAMA-867.patch.2
>
>
> HAHA 0.7 Snapshot cannot work with HDFS 2X due to lack of libs



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Cutting a 0.7 release

2014-02-24 Thread Chia-Hung Lin
Programmer can't control java memory like malloc/ free in c, type
boxing/ unboxing, etc., it seems not be easy to evaluate the memory.
So it would be good sticking to erlang fail fast style. Or we can have
a programme that load data and measure the actual memory usage.


On 24 February 2014 22:32, Tommaso Teofili  wrote:
> 2014-02-24 13:52 GMT+01:00 Edward J. Yoon :
>
>> 0.6.4 or 0.7.0, Both are OK to me.
>>
>> Just FYI,
>>
>> The memory efficiency has been significantly (almost x2-3) improved by
>> runtime message serialization and compression. See
>>
>> https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
>> (I'll attach more benchmarks and comparisons with other systems result
>> soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
>> SemiClustering, Graph's Combiners HAMA-857.
>>
>
> sure, all the above things look good to me.
>
>
>>
>> According to my personal evaluations, current system is fairly
>> respectable. As I mentioned before, I believe we should stick to
>> in-memory style since the today's machines can be equipped with up to
>> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
>> must-have.
>>
>
> right, the only thing that I think we need to address before 0.7.0 is
> related to the OutOfMemory errors (especially when dealing with large
> graphs); for example IMHO even if the memory is not enough to store all the
> graph vertices assigned to a certain peer, a scalable system should never
> throw OOM exceptions, instead it may eventually process items slower (with
> caches / queues) but never throw an exception for that but that's just my
> opinion.
>
>
>>
>> Once we release this one, we finally might want to focus on below issues:
>>
>> * Fault tolerant job processing (checkpoint recovery)
>>
>
> +1
>
>
>> * Support GPUs and InfiniBand
>>
>
> +1 for the former, not sure about the latter.
>
>
>>
>> Then, I think we can release version 1.0.
>>
>
> My 2 cents,
> Tommaso
>
>
>>
>> On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
>>  wrote:
>> > Would you cut 0.7 or 0.6.4 ?
>> > I'd go with 0.6.4 as I think the next minor version change should be due
>> to
>> > significant feature additions / changes and / or stability / scalability
>> > improvements.
>> >
>> > Regards,
>> > Tommaso
>> >
>> >
>> > 2014-02-24 8:47 GMT+01:00 Edward J. Yoon :
>> >
>> >> Hi all,
>> >>
>> >> I plan on cutting a release next week. If you have some opinions, Pls
>> feel
>> >> free to comment here.
>> >>
>> >> Sent from my iPhone
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer, Inc.
>>


Re: Cutting a 0.7 release

2014-02-24 Thread Chia-Hung Lin
Just let you know I may refactor based on the following diagram.

http://people.apache.org/~chl501/diagram1.png

That sketches the basic flow required for ft. I am currently evaluate
related parts, so it's subjected to change.






On 24 February 2014 20:52, Edward J. Yoon  wrote:
> 0.6.4 or 0.7.0, Both are OK to me.
>
> Just FYI,
>
> The memory efficiency has been significantly (almost x2-3) improved by
> runtime message serialization and compression. See
> https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
> (I'll attach more benchmarks and comparisons with other systems result
> soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
> SemiClustering, Graph's Combiners HAMA-857.
>
> According to my personal evaluations, current system is fairly
> respectable. As I mentioned before, I believe we should stick to
> in-memory style since the today's machines can be equipped with up to
> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
> must-have.
>
> Once we release this one, we finally might want to focus on below issues:
>
> * Fault tolerant job processing (checkpoint recovery)
> * Support GPUs and InfiniBand
>
> Then, I think we can release version 1.0.
>
> On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
>  wrote:
>> Would you cut 0.7 or 0.6.4 ?
>> I'd go with 0.6.4 as I think the next minor version change should be due to
>> significant feature additions / changes and / or stability / scalability
>> improvements.
>>
>> Regards,
>> Tommaso
>>
>>
>> 2014-02-24 8:47 GMT+01:00 Edward J. Yoon :
>>
>>> Hi all,
>>>
>>> I plan on cutting a release next week. If you have some opinions, Pls feel
>>> free to comment here.
>>>
>>> Sent from my iPhone
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer, Inc.


Re: Cutting a 0.7 release

2014-02-24 Thread Edward J. Yoon
1) Map and Reduce model is a file-based communication. So, each
mappers can run separately. For example, To run MR job on 1 GB input
data, 5 mappers will be scheduled. Even though there are only 2 task
slots (single machine), MR job slow but works - 2 running Map Tasks, 3
pending Map tasks.

However, unlike MapReduce, BSP uses network-based communication. It
means that the every BSP tasks must run at once. And the number of BSP
tasks is determined by the number of blocks of input. So, you CANNOT
run 1 GB input data on a single machine. It's not a Memory issue.

> throw OOM exceptions, instead it may eventually process items slower (with
> caches / queues) but never throw an exception for that but that's just my

I hope so too, but I think you are saying about Iterative MapReduce.

2) The normal block size of HDFS is 64 ~ 256 MB. If we can assume that
the split size = block size, I feel that current system is enough.

I don't think we have to spend a time for implementing disk-based something.

WDYT?

On Tue, Feb 25, 2014 at 12:19 AM, Anastasis Andronidis
 wrote:
> On 24 Φεβ 2014, at 3:32 μ.μ., Tommaso Teofili  
> wrote:
>
>>>
>>> According to my personal evaluations, current system is fairly
>>> respectable. As I mentioned before, I believe we should stick to
>>> in-memory style since the today's machines can be equipped with up to
>>> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
>>> must-have.
>>>
>>
>> right, the only thing that I think we need to address before 0.7.0 is
>> related to the OutOfMemory errors (especially when dealing with large
>> graphs); for example IMHO even if the memory is not enough to store all the
>> graph vertices assigned to a certain peer, a scalable system should never
>> throw OOM exceptions, instead it may eventually process items slower (with
>> caches / queues) but never throw an exception for that but that's just my
>> opinion.
>>
>
> I like and agree with this.
>
> Cheers,
> Anastasis
>



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.


Build failed in Jenkins: Hama-Nightly-for-Hadoop-1.x #1192

2014-02-24 Thread Apache Jenkins Server
See 

--
[...truncated 103085 lines...]
14/02/25 00:52:29 INFO kmeans.KMeansBSP: Finished! Writing the assignments...

14/02/25 00:52:29 INFO kmeans.KMeansBSP: Done.

14/02/25 00:52:29 INFO kmeans.KMeansBSP: Finished! Writing the assignments...

14/02/25 00:52:29 INFO kmeans.KMeansBSP: Done.

14/02/25 00:52:29 INFO kmeans.KMeansBSP: Finished! Writing the assignments...

14/02/25 00:52:29 INFO kmeans.KMeansBSP: Done.

14/02/25 00:52:32 INFO bsp.BSPJobClient: Current supersteps number: 1

14/02/25 00:52:32 INFO bsp.BSPJobClient: The total number of supersteps: 1

14/02/25 00:52:32 INFO bsp.BSPJobClient: Counters: 10

14/02/25 00:52:32 INFO bsp.BSPJobClient:   
org.apache.hama.bsp.JobInProgress$JobCounter

14/02/25 00:52:32 INFO bsp.BSPJobClient: SUPERSTEPS=1

14/02/25 00:52:32 INFO bsp.BSPJobClient: LAUNCHED_TASKS=3

14/02/25 00:52:32 INFO bsp.BSPJobClient:   
org.apache.hama.bsp.BSPPeerImpl$PeerCounter

14/02/25 00:52:32 INFO bsp.BSPJobClient: SUPERSTEP_SUM=6

14/02/25 00:52:32 INFO bsp.BSPJobClient: MESSAGE_BYTES_TRANSFERED=522

14/02/25 00:52:32 INFO bsp.BSPJobClient: IO_BYTES_READ=8787

14/02/25 00:52:32 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=25

14/02/25 00:52:32 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=18

14/02/25 00:52:32 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=303

14/02/25 00:52:32 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=18

14/02/25 00:52:32 INFO bsp.BSPJobClient: TASK_OUTPUT_RECORDS=101

{0=[50.0, 50.0]}
Partition 0: from 0 to 24
Partition 1: from 25 to 49
Partition 2: from 50 to 74
Partition 3: from 75 to 100
14/02/25 00:52:32 INFO bsp.FileInputFormat: Total input paths to process : 4

14/02/25 00:52:32 WARN bsp.BSPJobClient: No job jar file set.  User classes may 
not be found. See BSPJob#setJar(String) or check Your jar file.

14/02/25 00:52:32 INFO bsp.LocalBSPRunner: Setting up a new barrier for 4 tasks!

14/02/25 00:52:32 INFO bsp.BSPJobClient: Running job: job_localrunner_0001

14/02/25 00:52:32 INFO kmeans.KMeansBSP: Finished! Writing the assignments...

14/02/25 00:52:32 INFO kmeans.KMeansBSP: Finished! Writing the assignments...

14/02/25 00:52:32 INFO kmeans.KMeansBSP: Finished! Writing the assignments...

14/02/25 00:52:32 INFO kmeans.KMeansBSP: Finished! Writing the assignments...

14/02/25 00:52:32 INFO kmeans.KMeansBSP: Done.

14/02/25 00:52:32 INFO kmeans.KMeansBSP: Done.

14/02/25 00:52:32 INFO kmeans.KMeansBSP: Done.

14/02/25 00:52:32 INFO kmeans.KMeansBSP: Done.

14/02/25 00:52:35 INFO bsp.BSPJobClient: Current supersteps number: 1

14/02/25 00:52:35 INFO bsp.BSPJobClient: The total number of supersteps: 1

14/02/25 00:52:35 INFO bsp.BSPJobClient: Counters: 10

14/02/25 00:52:35 INFO bsp.BSPJobClient:   
org.apache.hama.bsp.JobInProgress$JobCounter

14/02/25 00:52:35 INFO bsp.BSPJobClient: SUPERSTEPS=1

14/02/25 00:52:35 INFO bsp.BSPJobClient: LAUNCHED_TASKS=4

14/02/25 00:52:35 INFO bsp.BSPJobClient:   
org.apache.hama.bsp.BSPPeerImpl$PeerCounter

14/02/25 00:52:35 INFO bsp.BSPJobClient: SUPERSTEP_SUM=8

14/02/25 00:52:35 INFO bsp.BSPJobClient: MESSAGE_BYTES_TRANSFERED=928

14/02/25 00:52:35 INFO bsp.BSPJobClient: IO_BYTES_READ=8787

14/02/25 00:52:35 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=13

14/02/25 00:52:35 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=32

14/02/25 00:52:35 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=303

14/02/25 00:52:35 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=32

14/02/25 00:52:35 INFO bsp.BSPJobClient: TASK_OUTPUT_RECORDS=101

{0=[50.0, 50.0]}
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec
Killed
Running org.apache.hama.ml.regression.LogisticRegressionModelTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.01 sec
Running org.apache.hama.ml.regression.VectorDoubleFileInputFormatTest
14/02/25 00:52:43 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

14/02/25 00:52:43 WARN snappy.LoadSnappy: Snappy native library not loaded

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec
Running org.apache.hama.ml.perception.TestSmallMLPMessage
14/02/25 00:52:44 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.01 sec
Running org.apache.hama.ml.recommendation.TestOnlineCF
14/02/25 00:52:45 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

14/02/25 00:52:45 INFO bsp.FileInputFormat: Total input paths to process : 1

14/02/25 00:52:45 INFO bsp.FileInputFormat: Total input paths to process : 1

14/02/25 00:52:45 WARN bsp.BSPJobClient: No job jar file set.  User classes may 
not be found. See BSPJob#s

Build failed in Jenkins: Hama-Nightly-for-Hadoop-2.x #188

2014-02-24 Thread Apache Jenkins Server
See 

--
[...truncated 12785 lines...]
1201/1201 KB   6/15 KB   40/51 KB   
1201/1201 KB   6/15 KB   44/51 KB   
1201/1201 KB   10/15 KB   44/51 KB   
1201/1201 KB   10/15 KB   48/51 KB   
1201/1201 KB   14/15 KB   48/51 KB   
1201/1201 KB   14/15 KB   51/51 KB   
1201/1201 KB   15/15 KB   51/51 KB   
 
Downloaded: 
http://repo.maven.apache.org/maven2/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
 (1201 KB at 15588.5 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-velocity/1.1.7/plexus-velocity-1.1.7.jar
4/37 KB   15/15 KB   51/51 KB
8/37 KB   15/15 KB   51/51 KB   
12/37 KB   15/15 KB   51/51 KB   
16/37 KB   15/15 KB   51/51 KB   
20/37 KB   15/15 KB   51/51 KB   
24/37 KB   15/15 KB   51/51 KB   
28/37 KB   15/15 KB   51/51 KB   
32/37 KB   15/15 KB   51/51 KB   
36/37 KB   15/15 KB   51/51 KB   
37/37 KB   15/15 KB   51/51 KB   
4/11 KB   37/37 KB   15/15 KB   51/51 KB   
8/11 KB   37/37 KB   15/15 KB   51/51 KB   
11/11 KB   37/37 KB   15/15 KB   51/51 KB   

Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-xhtml/1.2/doxia-module-xhtml-1.2.jar
 (15 KB at 546.0 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/velocity/velocity/1.5/velocity-1.5.jar

Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-site-renderer/1.2/doxia-site-renderer-1.2.jar
 (51 KB at 1754.2 KB/sec)
Downloading: http://repo.maven.apache.org/maven2/oro/oro/2.0.8/oro-2.0.8.jar
4/8 KB   11/11 KB   37/37 KB
8/8 KB   11/11 KB   37/37 KB   
   
Downloaded: 
http://repo.maven.apache.org/maven2/org/apache/maven/doxia/doxia-module-fml/1.2/doxia-module-fml-1.2.jar
 (37 KB at 1369.3 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/apache/maven/reporting/maven-reporting-api/3.0/maven-reporting-api-3.0.jar
   
Downloaded: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-i18n/1.0-beta-7/plexus-i18n-1.0-beta-7.jar
 (11 KB at 380.1 KB/sec)
Downloading: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.8/plexus-utils-3.0.8.jar
8/8 KB   1/64 KB   
8/8 KB   5/64 KB   
8/8 KB   9/64 KB   
8/8 KB   13/64 KB   
8/8 KB   17/64 KB   
8/8 KB   21/64 KB   
8/8 KB   25/64 KB   
8/8 KB   29/64 KB   
8/8 KB   33/64 KB   
4/383 KB   8/8 KB   33/64 KB   
4/383 KB   8/8 KB   37/64 KB   
4/383 KB   8/8 KB   41/64 KB   
8/383 KB   8/8 KB   41/64 KB   
8/383 KB   8/8 KB   45/64 KB   
12/383 KB   8/8 KB   45/64 KB   
12/383 KB   8/8 KB   49/64 KB   
16/383 KB   8/8 KB   49/64 KB   
16/383 KB   8/8 KB   53/64 KB   
16/383 KB   8/8 KB   57/64 KB   
20/383 KB   8/8 KB   57/64 KB   
20/383 KB   8/8 KB   61/64 KB   
24/383 KB   8/8 KB   61/64 KB   
24/383 KB   8/8 KB   64/64 KB   
28/383 KB   8/8 KB   64/64 KB   
32/383 KB   8/8 KB   64/64 KB   
36/383 KB   8/8 KB   64/64 KB   
40/383 KB   8/8 KB   64/64 KB   
44/383 KB   8/8 KB   64/64 KB   
48/383 KB   8/8 KB   64/64 KB   
52/383 KB   8/8 KB   64/64 KB   
56/383 KB   8/8 KB   64/64 KB   
60/383 KB   8/8 KB   64/64 KB   
64/383 KB   8/8 KB   64/64 KB   
68/383 KB   8/8 KB   64/64 KB   
72/383 KB   8/8 KB   64/64 KB   
76/383 KB   8/8 KB   64/64 KB   
80/383 KB   8/8 KB   64/64 KB   
84/383 KB   8/8 KB   64/64 KB   
88/383 KB   8/8 KB   64/64 KB   
92/383 KB   8/8 KB   64/64 KB   
96/383 KB   8/8 KB   64/64 KB   
100/383 KB   8/8 KB   64/64 KB   
104/383 KB   8/8 KB   64/64 KB   
108/383 KB   8/8 KB   64/64 KB   
112/383 KB   8/8 KB   64/64 KB   
 
Downloaded: 
http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-velocity/1.1.7/plexus-velocity-1.1.7.jar
 (8 KB at 267.4 KB/sec)
116/383 KB   64/64 KB
120/383 KB   64/64 KB   
124/383 KB   64/64 KB   
128/383 KB   64/64 KB   
132/383 KB   64/64 KB   
136/383 KB   64/64 KB   
140/383 KB   64/64 KB   
144/383 KB   64/64 KB   
148/383 KB   64/64 KB   
152/383 KB   64/64 KB   
156/383 KB   64/64 KB   
160/383 KB   64/64 KB   
164/383 KB   64/64 KB   
168/383 KB   64/64 KB   
172/383 KB   64/64 KB   
176/383 KB   64/64 KB   
180/383 KB   64/64 KB   
184/383 KB   64/64 KB   
188/383 KB   64/64 KB   
192/383 KB   64/64 KB   
196/383 KB   64/64 KB   
200/383 KB   64/64 KB   
204/383 KB   64/64 KB   
208/383 KB   64/64 KB   
212/383 KB   64/64 KB   
216/383 KB   64/64 KB   
220/383 KB   64/64 KB   
224/383 KB   64/64 KB   
228/383 KB   64/64 KB   
232/383 KB   64/64 KB   
236/383 KB   64/64 KB   
240/383 KB   64/64 KB   
244/383 KB   64/64 KB   
248/383 KB   64/64 KB   
252/383 KB   64/64 KB   
256/383 KB   64/64 KB   
260/383 KB   64/64 KB   
264/383 KB   64/64 KB   
268/383 KB   64/64 KB   
272/383 KB   64/64 KB   
276/383 KB   64/64 KB   
280/383 K

Implementation of DoubleVector/DenseDoubleVector/SparseDoubleVector

2014-02-24 Thread Yexi Jiang
Hi, All,

I am currently working on the SparseDoubleVector (HAMA-863) and found some
unclear places about the vector implementation.

1. What is the definition for a vector? According to the implementation, it
is implemented as elementwise sqrt. In such a case, problem will occur if
the one of the entry is negative.

2. Most of the operators are conducted on a copy of the current object. Do
we also need to provide a set of operators that directly modify the current
object itself? e.g. addOriginal, subtractOriginal, etc.

3. When a DenseDoubleVector operates with a SparseDoubleVector, what will
be the concrete type of the result object? A simple implementation is to
always return a SparseDoubleVector, even if it is dense. A complex
implementation is we maintain a sparsity ratio (the ratio of non-default
entries), if the ratio exceed a threshold, a DenseDoubleVector will be
returned.

4. Is the toArray method available for SparseDoubleVector? In my opinion,
it is better not to do that.


Regards,
Yexi


Re: Cutting a 0.7 release

2014-02-24 Thread Anastasis Andronidis
On 24 Φεβ 2014, at 3:32 μ.μ., Tommaso Teofili  wrote:

>> 
>> According to my personal evaluations, current system is fairly
>> respectable. As I mentioned before, I believe we should stick to
>> in-memory style since the today's machines can be equipped with up to
>> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
>> must-have.
>> 
> 
> right, the only thing that I think we need to address before 0.7.0 is
> related to the OutOfMemory errors (especially when dealing with large
> graphs); for example IMHO even if the memory is not enough to store all the
> graph vertices assigned to a certain peer, a scalable system should never
> throw OOM exceptions, instead it may eventually process items slower (with
> caches / queues) but never throw an exception for that but that's just my
> opinion.
> 

I like and agree with this.

Cheers,
Anastasis



Re: Cutting a 0.7 release

2014-02-24 Thread Tommaso Teofili
2014-02-24 13:52 GMT+01:00 Edward J. Yoon :

> 0.6.4 or 0.7.0, Both are OK to me.
>
> Just FYI,
>
> The memory efficiency has been significantly (almost x2-3) improved by
> runtime message serialization and compression. See
>
> https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
> (I'll attach more benchmarks and comparisons with other systems result
> soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
> SemiClustering, Graph's Combiners HAMA-857.
>

sure, all the above things look good to me.


>
> According to my personal evaluations, current system is fairly
> respectable. As I mentioned before, I believe we should stick to
> in-memory style since the today's machines can be equipped with up to
> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
> must-have.
>

right, the only thing that I think we need to address before 0.7.0 is
related to the OutOfMemory errors (especially when dealing with large
graphs); for example IMHO even if the memory is not enough to store all the
graph vertices assigned to a certain peer, a scalable system should never
throw OOM exceptions, instead it may eventually process items slower (with
caches / queues) but never throw an exception for that but that's just my
opinion.


>
> Once we release this one, we finally might want to focus on below issues:
>
> * Fault tolerant job processing (checkpoint recovery)
>

+1


> * Support GPUs and InfiniBand
>

+1 for the former, not sure about the latter.


>
> Then, I think we can release version 1.0.
>

My 2 cents,
Tommaso


>
> On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
>  wrote:
> > Would you cut 0.7 or 0.6.4 ?
> > I'd go with 0.6.4 as I think the next minor version change should be due
> to
> > significant feature additions / changes and / or stability / scalability
> > improvements.
> >
> > Regards,
> > Tommaso
> >
> >
> > 2014-02-24 8:47 GMT+01:00 Edward J. Yoon :
> >
> >> Hi all,
> >>
> >> I plan on cutting a release next week. If you have some opinions, Pls
> feel
> >> free to comment here.
> >>
> >> Sent from my iPhone
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer, Inc.
>


Re: Cutting a 0.7 release

2014-02-24 Thread Edward J. Yoon
0.6.4 or 0.7.0, Both are OK to me.

Just FYI,

The memory efficiency has been significantly (almost x2-3) improved by
runtime message serialization and compression. See
https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
(I'll attach more benchmarks and comparisons with other systems result
soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
SemiClustering, Graph's Combiners HAMA-857.

According to my personal evaluations, current system is fairly
respectable. As I mentioned before, I believe we should stick to
in-memory style since the today's machines can be equipped with up to
128 GB. Disk (or disk hybrid) based queue is a optional, not a
must-have.

Once we release this one, we finally might want to focus on below issues:

* Fault tolerant job processing (checkpoint recovery)
* Support GPUs and InfiniBand

Then, I think we can release version 1.0.

On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
 wrote:
> Would you cut 0.7 or 0.6.4 ?
> I'd go with 0.6.4 as I think the next minor version change should be due to
> significant feature additions / changes and / or stability / scalability
> improvements.
>
> Regards,
> Tommaso
>
>
> 2014-02-24 8:47 GMT+01:00 Edward J. Yoon :
>
>> Hi all,
>>
>> I plan on cutting a release next week. If you have some opinions, Pls feel
>> free to comment here.
>>
>> Sent from my iPhone



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.


Re: Cutting a 0.7 release

2014-02-24 Thread Tommaso Teofili
Would you cut 0.7 or 0.6.4 ?
I'd go with 0.6.4 as I think the next minor version change should be due to
significant feature additions / changes and / or stability / scalability
improvements.

Regards,
Tommaso


2014-02-24 8:47 GMT+01:00 Edward J. Yoon :

> Hi all,
>
> I plan on cutting a release next week. If you have some opinions, Pls feel
> free to comment here.
>
> Sent from my iPhone