[jira] [Updated] (HAMA-767) [GSoC 2013] Vertex addition and removal

2013-07-18 Thread Anastasis Andronidis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anastasis Andronidis updated HAMA-767:
--

Attachment: HAMA-767-v3.patch

Hello, I made some heavy modifications on doAggregationUpdates(). I created a 
new interface for a new type of class (PeerAggregator) that is acting like a 
vertex aggregator but it has access in GraphJobRunner. I moved every logic that 
applies on globalUpdateCounts inside FinishPeerAggregator. I also used the same 
approach to solve the problem with vertex counting (see 
CountVerticesPeerAggregator). All tests are passing, I tried to change nothing 
from the workflow, but unfortunately I couldn't skip the sync for 
globalUpdateCounts as it was before (we are doing 2 supersteps for 1 iteration 
due to aggregators). 

I still have lots to do, I just want to know if you like this approach.

> [GSoC 2013] Vertex addition and removal
> ---
>
> Key: HAMA-767
> URL: https://issues.apache.org/jira/browse/HAMA-767
> Project: Hama
>  Issue Type: New Feature
>  Components: examples, graph
>Affects Versions: 0.6.1
>Reporter: Anastasis Andronidis
>Assignee: Anastasis Andronidis
>  Labels: dynamic, graph, gsoc, gsoc2013, mentoring
> Fix For: 0.6.3
>
> Attachments: HAMA-767-addAndRemove-v1.patch, 
> HAMA-767-addition-v1.patch, HAMA-767-examplesAndTests-v1.patch, 
> HAMA-767-v2.patch, HAMA-767-v3.patch
>
>
> Implement addVertex and removeVertex methods for incremental graph support on 
> Graph API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Hama-Nightly-for-Hadoop-1.x #977

2013-07-18 Thread Apache Jenkins Server
See 



[jira] [Updated] (HAMA-782) The arguments of DoubleVector.slice(int, int) method will mislead the user

2013-07-18 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated HAMA-782:


Attachment: HAMA-782.patch

Add more test cases.

> The arguments of DoubleVector.slice(int, int) method will mislead the user
> --
>
> Key: HAMA-782
> URL: https://issues.apache.org/jira/browse/HAMA-782
> Project: Hama
>  Issue Type: Improvement
>  Components: machine learning, math
>Reporter: Yexi Jiang
>Assignee: Yexi Jiang
> Fix For: 0.6.3
>
> Attachments: HAMA-782.patch, HAMA-782.patch
>
>
> The current implementation of DoubleVector.slice(int, int) is ambiguous.
> Current description of this method is as follows:
> 
> Slices this vector from index offset with the given length. So you end at the 
> upper bound of (offset+length).
> 
> If the given vector is vec = [0, 1, 2, 3, 4, 5, 6], and user uses 
> vec.slice(2, 3) and hope to get [2, 3, 4]. However, it actually returns [2,3].
> This is because the actual implementation extract the elements start from 
> 'offset' and end at 'length' (exclusive). The argument name will mislead the 
> user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HAMA-782) The arguments of DoubleVector.slice(int, int) method will mislead the user

2013-07-18 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated HAMA-782:


Attachment: HAMA-782.patch

1. Change the name of argument and comments to make it more clear.
2. Add pre-condition check.
3. Add test cases.

> The arguments of DoubleVector.slice(int, int) method will mislead the user
> --
>
> Key: HAMA-782
> URL: https://issues.apache.org/jira/browse/HAMA-782
> Project: Hama
>  Issue Type: Improvement
>  Components: machine learning, math
>Reporter: Yexi Jiang
>Assignee: Yexi Jiang
> Fix For: 0.6.3
>
> Attachments: HAMA-782.patch
>
>
> The current implementation of DoubleVector.slice(int, int) is ambiguous.
> Current description of this method is as follows:
> 
> Slices this vector from index offset with the given length. So you end at the 
> upper bound of (offset+length).
> 
> If the given vector is vec = [0, 1, 2, 3, 4, 5, 6], and user uses 
> vec.slice(2, 3) and hope to get [2, 3, 4]. However, it actually returns [2,3].
> This is because the actual implementation extract the elements start from 
> 'offset' and end at 'length' (exclusive). The argument name will mislead the 
> user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HAMA-782) The arguments of DoubleVector.slice(int, int) method will mislead the user

2013-07-18 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAMA-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated HAMA-782:


Status: Patch Available  (was: Open)

1. Change the name of argument and comments to make it more clear.
2. Add pre-condition check.
3. Add test cases.

> The arguments of DoubleVector.slice(int, int) method will mislead the user
> --
>
> Key: HAMA-782
> URL: https://issues.apache.org/jira/browse/HAMA-782
> Project: Hama
>  Issue Type: Improvement
>  Components: machine learning, math
>Reporter: Yexi Jiang
>Assignee: Yexi Jiang
> Fix For: 0.6.3
>
> Attachments: HAMA-782.patch
>
>
> The current implementation of DoubleVector.slice(int, int) is ambiguous.
> Current description of this method is as follows:
> 
> Slices this vector from index offset with the given length. So you end at the 
> upper bound of (offset+length).
> 
> If the given vector is vec = [0, 1, 2, 3, 4, 5, 6], and user uses 
> vec.slice(2, 3) and hope to get [2, 3, 4]. However, it actually returns [2,3].
> This is because the actual implementation extract the elements start from 
> 'offset' and end at 'length' (exclusive). The argument name will mislead the 
> user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HAMA-782) The arguments of DoubleVector.slice(int, int) method will mislead the user

2013-07-18 Thread Yexi Jiang (JIRA)
Yexi Jiang created HAMA-782:
---

 Summary: The arguments of DoubleVector.slice(int, int) method will 
mislead the user
 Key: HAMA-782
 URL: https://issues.apache.org/jira/browse/HAMA-782
 Project: Hama
  Issue Type: Improvement
  Components: machine learning, math
Reporter: Yexi Jiang
Assignee: Yexi Jiang
 Fix For: 0.6.3


The current implementation of DoubleVector.slice(int, int) is ambiguous.
Current description of this method is as follows:


Slices this vector from index offset with the given length. So you end at the 
upper bound of (offset+length).


If the given vector is vec = [0, 1, 2, 3, 4, 5, 6], and user uses vec.slice(2, 
3) and hope to get [2, 3, 4]. However, it actually returns [2,3].

This is because the actual implementation extract the elements start from 
'offset' and end at 'length' (exclusive). The argument name will mislead the 
user.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Dynamic vertices and hama counters

2013-07-18 Thread Chia-Hung Lin
Sorry my bad. Only focused on counter stuff. Didn't pay attention to
Vertex related issue. Thought that just want to share counter value
between peers. In that case persisting counter value to zk shouldn't
be a problem, and won't incur overhead. But if the case is not about
counter, please just ignore my previous post.


On 17 July 2013 06:59, Edward J. Yoon  wrote:
> You guys seems totally misunderstood what I am saying.
>
> Every BSP processor accesses to ZK's counter concurrently? Do you
> think it is possible to determine the current total number of vertices
> in every step without barrier synchronization?
>
> As I mentioned before, there is already additional barrier
> synchronization steps for aggregating and broadcasting global updated
> vertex count. You can use this steps without *no additional barrier
> synchronization*.
>
> On Wed, Jul 17, 2013 at 5:01 AM, andronat_asf  
> wrote:
>> Thank you everyone,
>>
>> +1 for Tommaso, I will see what I can do about that :)
>>
>> I also believe that ZK is very similar sync() mechanism that Edward is 
>> saying, but if we need to sync more info we might need ZK.
>>
>> Thanks again,
>> Anastasis
>>
>> On 15 Ιουλ 2013, at 5:55 μ.μ., Edward J. Yoon  wrote:
>>
>>> andronat_asf,
>>>
>>> To aggregate and broadcast the global count of updated vertices, we
>>> calls sync() twice. See the doAggregationUpdates() method in
>>> GraphJobRunner. You can solve your problem the same way, and there
>>> will be no additional cost.
>>>
>>> Use of Zookeeper is not bad idea. But IMO, it's not much different
>>> with sync() mechanism.
>>>
>>> On Mon, Jul 15, 2013 at 10:05 PM, Chia-Hung Lin  
>>> wrote:
 +1 for Tommaso's solution.

 If not every algorithm needs counter service, having an interface with
 different implementations (in-memory, zk, etc.) should reduce the side
 effect.


 On 15 July 2013 15:51, Tommaso Teofili  wrote:
> what about introducing a proper API for counting vertices, something like
> an interface VertexCounter with 2-3 implementations like
> InMemoryVertexCounter (basically the current one), a
> DistributedVertexCounter to implement the scenario where we use a separate
> BSP superstep to count them and a ZKVertexCounter which handles vertices
> counts as per Chian-Hung's suggestion.
>
> Also we may introduce something like a configuration variable to define if
> all the vertices are needed or just the neighbors (and/or some other
> strategy).
>
> My 2 cents,
> Tommaso
>
> 2013/7/14 Chia-Hung Lin 
>
>> Just my personal viewpoint. For small size of global information,
>> considering to store the state in ZooKeeper might be a reasonable
>> solution.
>>
>> On 13 July 2013 21:28, andronat_asf  wrote:
>>> Hello everyone,
>>>
>>> I'm working on HAMA-767 and I have some concerns on counters and
>> scalability. Currently, every peer has a set of vertices and a variable
>> that is keeping the total number of vertices through all peers. In my 
>> case,
>> I'm trying to add and remove vertices during the runtime of a job, which
>> means that I have to update all those variables.
>>>
>>> My problem is that this is not efficient because in every operation (add
>> or remove a vertex) I need to update all peers, so I need to send lots of
>> messages to make those updates (see GraphJobRunner#countGlobalVertexCount
>> method) and I believe this is not correct and scalable. An other problem 
>> is
>> that, even if I update all those variable (with the cost of sending lots 
>> of
>> messages to every peer) those variables will be updated on the next
>> superstep.
>>>
>>> e.g.:
>>>
>>> Peer 1:Peer 2:
>>>  Vert_1  Vert_2
>>> (Total_V = 2)  (Total_V = 2)
>>> addVertex()
>>> (Total_V = 3)
>>> getNumberOfV() => 2
>>>
>>>  Sync 
>>>
>>> getNumberOfV() => 3
>>>
>>>
>>> Is there something like global counters or shared memory that it can
>> address this issue?
>>>
>>> P.S. I have a small feeling that we don't need to track the total amount
>> of vertices because vertex centered algorithms rarely need total numbers,
>> they only depend on neighbors (I might be wrong though).
>>>
>>> Thanks,
>>> Anastasis
>>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon


[jira] [Commented] (HAMA-780) New launched child processes by fault tolerance may not be able to contact each other

2013-07-18 Thread MaoYuan Xian (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712281#comment-13712281
 ] 

MaoYuan Xian commented on HAMA-780:
---

Yes. For the requirements of company product.
Except this problem, still have another bug found.
I also made a initial implementation to handle the situation when one groom 
server down, get rid of the down groom and recover the child task running in 
the down groom server.

> New launched child processes by fault tolerance may not be able to contact 
> each other
> -
>
> Key: HAMA-780
> URL: https://issues.apache.org/jira/browse/HAMA-780
> Project: Hama
>  Issue Type: Bug
>  Components: bsp core
>Affects Versions: 0.6.2
>Reporter: MaoYuan Xian
> Attachments: HAMA-780.patch
>
>
> When fault tolerance enabled, sometimes recovery process fail because of new 
> launched child process can not send message to each other.
> I can finally find the cause:
> On one hand, when a new child process is launched for recovery, its port is 
> set via following logic:
> {code}
>   final BSPTask task = (BSPTask) umbilical.getTask(taskid);
>   int peerPort = umbilical.getAssignedPortNum(taskid);
>   ...
>   defaultConf.setInt(Constants.PEER_PORT, peerPort);
> {code}
> These logic will find the lowest available port for new comming process:
> {code}
>   public static int getNextAvailable(int fromPort) {
> ...
> for (int i = fromPort + 1; i <= MAX_PORT_NUMBER; i++) {
>   if (available(i)) {
> return i;
>   }
> }
> ...
>   }
> {code}
> List a use case here:
> Run one job with 3 child tasks, they are listening to hostname:61001, 
> hostname:61002, hostname:61003
> In case the task listens to hostname:61002 failed (because of disk problem or 
> kill by system's memory protection program), the 61002 port is release now.
> Recovery process start, trigger three new processes, assign to the addresses 
> hostname:61002, hostname:61004, hostname:61005. (61001, 61003 is still be 
> held by old child task before they quit).
> During this recovery phase, we can find the /bsp/job_id/peers directory in 
> zookeeper is something like
> {quote}
> hostname:61001, hostname:61002, hostname:61005, hostname:61003, hostname:61004
> {quote}
> One the other hand, new launched child processes try to find each other from 
> zookeeper when they are launch (in BSPPeerImpl.java):
> {code}
>   private final void initPeerNames() {
> if (allPeers == null) {
>   allPeers = syncClient.getAllPeerNames(taskId);
> }
>   }
> {code}
> {code}
>   public String[] getAllPeerNames(TaskAttemptID taskId) {
> if (allPeers == null) {
>   TreeMap sortedMap = new TreeMap();
>   try {
> List var = zk.getChildren(
> constructKey(taskId.getJobID(), "peers"), this);
> allPeers = var.toArray(new String[var.size()]);
> for (String s : allPeers) {
>   ...
>   boolean result = getValueFromBytes(data, thatTask);
>   if (result) {
> LOG.debug("TASK mapping from zookeeper: " + thatTask + " ID:"
> + thatTask.getTaskID().getId() + " : " + s);
> sortedMap.put(thatTask.getTaskID().getId(), s);
>   }
> }
>   } catch (Exception e) {
> LOG.error(e);
> throw new RuntimeException("All peer names could not be retrieved!");
>   }
> ...
>   }
> {code}
> Open the log, we can find the following:
> {quote}
> 13/07/13 00:03:39 DEBUG sync.ZooKeeperSyncClientImpl: TASK mapping from 
> zookeeper: attempt_201307122024_0005_01_0 ID:1 : hostname:61001
> 13/07/13 00:03:39 DEBUG sync.ZooKeeperSyncClientImpl: TASK mapping from 
> zookeeper: attempt_201307122024_0005_00_1 ID:0 : hostname:61002
> 13/07/13 00:03:39 DEBUG sync.ZooKeeperSyncClientImpl: TASK mapping from 
> zookeeper: attempt_201307122024_0005_02_1 ID:2 : hostname:61005
> 13/07/13 00:03:39 DEBUG sync.ZooKeeperSyncClientImpl: TASK mapping from 
> zookeeper: attempt_201307122024_0005_02_0 ID:2 : hostname:61003
> 13/07/13 00:03:39 DEBUG sync.ZooKeeperSyncClientImpl: TASK mapping from 
> zookeeper: attempt_201307122024_0005_01_1 ID:1 : hostname:61004
> {quote}
> New adding peer hostname:61005 is put before the hostname:61003, which make 
> the sortedMap in ZooKeeperSyncClientImpl has the map <2, hostname:61003> (the 
> above code sortedMap.put(thatTask.getTaskID().getId(), s) makes this happen). 
> The new round of processes communication will run into mal-function because 
> the message should be sent to "hostname:61005" will be sent to 
> "hostname:61003"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more infor

Re: Issue while using DiskVerticesInfo

2013-07-18 Thread Tommaso Teofili
thanks a lot Suraj, that'd be great.
Tommaso

2013/7/18 Suraj Menon 

> I can take a look at it this weekend.
>
> -Suraj
>
>
> On Wed, Jul 17, 2013 at 6:06 AM, Tommaso Teofili
> wrote:
>
> > Yes, I assumed the DiskVerticesInfo implementation was sorting vertices
> to
> > accomplish that so I wonder if we have a bug there (or, for example, if
> > there's something else to configure).
> >
> > Maybe @Suraj could help?
> >
> > Thanks in advance,
> > Tommaso
> >
> > 2013/7/17 Edward J. Yoon 
> >
> > > Hi,
> > >
> > > Our graph package is a pregel-like vertex-centric programming model,
> > > and it allows to communication between "vertices".
> > >
> > > Internally, each BSP processor performs computations for all assigned
> > > vertices.
> > >
> > >   /**
> > >* The user-defined function
> > >*/
> > >   public void compute(Iterable messages) throws IOException;
> > >
> > > To avoid grouping messages in received queue by vertex ID, we uses
> > > Sorted Message Queue, and calls user-defined function for each vertex
> > > sequentially. By using this sequential processing approach, we reduce
> > > the memory usage.
> > >
> > > The current problem is vertex loading phase (or partitioner). The
> > > loaded vertices in memory of each BSP processor should already be
> > > sorted by vertex ID. In ListVerticesInfo case,
> > >
> > >   @Override
> > >   public void finishAdditions() {
> > > Collections.sort(vertices);
> > >   }
> > >
> > > This is quick-fix solution. We have to sort the vertices by vertex ID
> > > at partitioning or loading phase.
> > >
> > > On Wed, Jul 17, 2013 at 4:44 PM, Tommaso Teofili
> > >  wrote:
> > > > Hi all,
> > > >
> > > > I was trying to run the TestSubmitGraphJob with DiskVerticesInfo and
> I
> > > got
> > > > this :
> > > >
> > > > 13/07/17 09:21:45 INFO graph.GraphJobRunner: 7 vertices are loaded
> into
> > > > 192.168.1.4:61001
> > > >
> > > > 13/07/17 09:21:45 ERROR bsp.BSPTask: Error running bsp setup and bsp
> > > > function.
> > > > java.lang.IllegalArgumentException: Messages must never be behind the
> > > > vertex in ID! Current Message ID: facebook.com vs. stackoverflow.com
> > > >  at
> > org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:281)
> > > > at
> > >
> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:229)
> > > >  at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:133)
> > > > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177)
> > > >  at org.apache.hama.bsp.BSPTask.run(BSPTask.java:146)
> > > > at
> > >
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1262)
> > > >
> > > > 13/07/17 09:21:45 INFO server.PrepRequestProcessor: Processed session
> > > > termination for sessionid: 0x13feb81547f0003
> > > >
> > > > 13/07/17 09:21:45 INFO server.NIOServerCnxn: Closed socket connection
> > for
> > > > client /0:0:0:0:0:0:0:1%0:51900 which had sessionid 0x13feb81547f0003
> > > >
> > > >
> > > > Does anyone know what could be the root cause of such a failure?
> > > >
> > > > Thanks a lot in advance,
> > > > Tommaso
> > >
> > >
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > > @eddieyoon
> > >
> >
>


Re: Issue while using DiskVerticesInfo

2013-07-18 Thread Suraj Menon
I can take a look at it this weekend.

-Suraj


On Wed, Jul 17, 2013 at 6:06 AM, Tommaso Teofili
wrote:

> Yes, I assumed the DiskVerticesInfo implementation was sorting vertices to
> accomplish that so I wonder if we have a bug there (or, for example, if
> there's something else to configure).
>
> Maybe @Suraj could help?
>
> Thanks in advance,
> Tommaso
>
> 2013/7/17 Edward J. Yoon 
>
> > Hi,
> >
> > Our graph package is a pregel-like vertex-centric programming model,
> > and it allows to communication between "vertices".
> >
> > Internally, each BSP processor performs computations for all assigned
> > vertices.
> >
> >   /**
> >* The user-defined function
> >*/
> >   public void compute(Iterable messages) throws IOException;
> >
> > To avoid grouping messages in received queue by vertex ID, we uses
> > Sorted Message Queue, and calls user-defined function for each vertex
> > sequentially. By using this sequential processing approach, we reduce
> > the memory usage.
> >
> > The current problem is vertex loading phase (or partitioner). The
> > loaded vertices in memory of each BSP processor should already be
> > sorted by vertex ID. In ListVerticesInfo case,
> >
> >   @Override
> >   public void finishAdditions() {
> > Collections.sort(vertices);
> >   }
> >
> > This is quick-fix solution. We have to sort the vertices by vertex ID
> > at partitioning or loading phase.
> >
> > On Wed, Jul 17, 2013 at 4:44 PM, Tommaso Teofili
> >  wrote:
> > > Hi all,
> > >
> > > I was trying to run the TestSubmitGraphJob with DiskVerticesInfo and I
> > got
> > > this :
> > >
> > > 13/07/17 09:21:45 INFO graph.GraphJobRunner: 7 vertices are loaded into
> > > 192.168.1.4:61001
> > >
> > > 13/07/17 09:21:45 ERROR bsp.BSPTask: Error running bsp setup and bsp
> > > function.
> > > java.lang.IllegalArgumentException: Messages must never be behind the
> > > vertex in ID! Current Message ID: facebook.com vs. stackoverflow.com
> > >  at
> org.apache.hama.graph.GraphJobRunner.iterate(GraphJobRunner.java:281)
> > > at
> > org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:229)
> > >  at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:133)
> > > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177)
> > >  at org.apache.hama.bsp.BSPTask.run(BSPTask.java:146)
> > > at
> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1262)
> > >
> > > 13/07/17 09:21:45 INFO server.PrepRequestProcessor: Processed session
> > > termination for sessionid: 0x13feb81547f0003
> > >
> > > 13/07/17 09:21:45 INFO server.NIOServerCnxn: Closed socket connection
> for
> > > client /0:0:0:0:0:0:0:1%0:51900 which had sessionid 0x13feb81547f0003
> > >
> > >
> > > Does anyone know what could be the root cause of such a failure?
> > >
> > > Thanks a lot in advance,
> > > Tommaso
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>


[jira] [Commented] (HAMA-742) Implement of Hama RPC

2013-07-18 Thread Edward J. Yoon (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712140#comment-13712140
 ] 

Edward J. Yoon commented on HAMA-742:
-

Avro was used as a default IO serialization protocol[1].

{code}
java.lang.ClassNotFoundException: Class org.apache.avro.io.DatumWriter not found
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1486)
at 
org.apache.hadoop.io.serializer.SerializationFactory.add(SerializationFactory.java:70)
at 
org.apache.hadoop.io.serializer.SerializationFactory.(SerializationFactory.java:63)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1173)
at 
org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:1103)
at 
org.apache.hama.bsp.SequenceFileRecordWriter.(SequenceFileRecordWriter.java:39)
{code}

1. 
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/serializer/SerializationFactory.java

So, we need to add avro dependency to lib, or "io.serializations" property to 
hama-default.xml.

{code}
  
io.serializations

org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.JavaSerialization
  
{code}

I prefer the latter (set "io.serializations" to WritableSerialization).


{quote}why are DistributedCache.add/setLocalFiles(conf, files.toString()); 
commented out ?{quote}

Oh thanks, my fault! I have to fix this.

> Implement of Hama RPC 
> --
>
> Key: HAMA-742
> URL: https://issues.apache.org/jira/browse/HAMA-742
> Project: Hama
>  Issue Type: Sub-task
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.6.3
>
> Attachments: HAMA-742_v01.patch, HAMA-742_v02.patch
>
>
> To solve HDFS 2.0 compatibility issue, we have to change a lot of codes for 
> Hadoop 2.0 RPC, moreover, yarn RPC doesn't support asynchronous call directly.
> Ultimately, we can pursue the performance and integrate more easily with 
> hadoop multi-versions by having our own RPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HAMA-742) Implement of Hama RPC

2013-07-18 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/HAMA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712106#comment-13712106
 ] 

Tommaso Teofili commented on HAMA-742:
--

I've had a quick review of the patch and I have 2 questions: 
* why is Avro dependency there? It seems to me it's not used but correct me if 
I'm wrong
* why are _DistributedCache.add/setLocalFiles(conf, files.toString());_ 
commented out ?

> Implement of Hama RPC 
> --
>
> Key: HAMA-742
> URL: https://issues.apache.org/jira/browse/HAMA-742
> Project: Hama
>  Issue Type: Sub-task
>Reporter: Edward J. Yoon
>Assignee: Edward J. Yoon
> Fix For: 0.6.3
>
> Attachments: HAMA-742_v01.patch, HAMA-742_v02.patch
>
>
> To solve HDFS 2.0 compatibility issue, we have to change a lot of codes for 
> Hadoop 2.0 RPC, moreover, yarn RPC doesn't support asynchronous call directly.
> Ultimately, we can pursue the performance and integrate more easily with 
> hadoop multi-versions by having our own RPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira