Thank you. What degree of parallelism are you using when submitting the job? You can either set it with the "-p" argument or as env.setDegreeOfParalleism(). How much heapspace do you assign to the TaskManagers?
On Sun, Jun 22, 2014 at 3:07 PM, José Luis López Pino <jllopezp...@gmail.com > wrote: > Hi, > > I'm using two instances of a VPS and using as this input for the program: > - Iterations: 2 > - Dimensions: 2 (3 for the scala example program) > - Number of centers (k): 10 > > This is my current configuration for network buffers (i think they are > values by default): > # Number of network buffers (used by each TaskManager) > taskmanager.network.numberOfBuffers: 2048 > # Size of network buffers > taskmanager.network.bufferSizeInBytes: 32768 > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement, > Pino > > > On 22 June 2014 14:19, Robert Metzger <rmetz...@apache.org> wrote: > >> Workers waiting in "LocalBufferPool.requestBuffer()" is usually a sign for >> a distributed deadlock. >> Can you send me some instructions on how to get the same input data you >> have (download url? generator settings?) and what configuration parameters >> you are using (max iteration limit, k, ?) when calling the K-Means >> example. >> I would like to try it on our cluster. >> >> Just out of curiosity, what hardware are you using? Is it the IBM Power >> cluster at TU Berlin? >> >> Robert >> >> >> On Sun, Jun 22, 2014 at 1:53 PM, Sebastian Schelter < >> ssc.o...@googlemail.com >> > wrote: >> >> > You could try to increase the number of buffers available to the network >> > stack. That solved similar problems for me in the past. >> > >> > -s >> > Am 22.06.2014 13:48 schrieb "José Luis López Pino" < >> jllopezp...@gmail.com >> > >: >> > >> > > It seems like the thread reading the points file is locked waiting >> for a >> > > buffer from the global buffer pool that doesn't come. What could be >> > causing >> > > this? >> > > >> > > java.lang.Thread.State: TIMED_WAITING (on object monitor) >> > > at java.lang.Object.wait(Native Method) >> > > - waiting on <0x6b985888> (a java.util.ArrayDeque) >> > > at >> > > >> > > >> > >> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBuffer(LocalBufferPool.java:160) >> > > - locked <0x6b985888> (a java.util.ArrayDeque) >> > > at >> > > >> > > >> > >> eu.stratosphere.runtime.io.network.bufferprovider.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:101) >> > > at >> > > >> > > >> > >> eu.stratosphere.runtime.io.gates.InputGate.requestBufferBlocking(InputGate.java:333) >> > > at >> > > >> > > >> > >> eu.stratosphere.runtime.io.channels.InputChannel.requestBufferBlocking(InputChannel.java:426) >> > > at >> > > >> > > >> > >> eu.stratosphere.runtime.io.network.ChannelManager.dispatchFromOutputChannel(ChannelManager.java:441) >> > > at >> > > >> > > >> > >> eu.stratosphere.runtime.io.channels.OutputChannel.sendBuffer(OutputChannel.java:74) >> > > at >> > > >> > >> eu.stratosphere.runtime.io.gates.OutputGate.sendBuffer(OutputGate.java:49) >> > > at >> > > >> > > >> > >> eu.stratosphere.runtime.io.api.BufferWriter.sendBuffer(BufferWriter.java:35) >> > > at >> > eu.stratosphere.runtime.io.api.RecordWriter.emit(RecordWriter.java:96) >> > > at >> > > >> > > >> > >> eu.stratosphere.pact.runtime.shipping.OutputCollector.collect(OutputCollector.java:82) >> > > at >> > > >> > > >> > >> eu.stratosphere.pact.runtime.task.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:71) >> > > at >> > > >> > > >> > >> eu.stratosphere.pact.runtime.task.DataSourceTask.invoke(DataSourceTask.java:228) >> > > at >> > > >> > > >> > >> eu.stratosphere.nephele.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:284) >> > > at java.lang.Thread.run(Thread.java:744) >> > > >> > > >> > > Thanks for your help Sebastian. >> > > >> > > Regards // Saludos // Mit Freundlichen Grüßen // Bien cordialement, >> > > Pino >> > > >> > > >> > > On 22 June 2014 13:38, Sebastian Schelter <ssc.o...@googlemail.com> >> > wrote: >> > > >> > > > Have you looked at a jstack dump on one of the workera? That >> typically >> > > > helps finding out, where the processes are stuck. >> > > > >> > > > -s >> > > > Am 22.06.2014 13:32 schrieb "José Luis López Pino" < >> > > jllopezp...@gmail.com >> > > > >: >> > > > >> > > > > Hi, >> > > > > >> > > > > I'm running the KMeans java and scala examples in two nodes. It >> works >> > > > fine >> > > > > with very small files (3MB) but when I try with files of 30MB or >> > bigger >> > > > the >> > > > > process never ends. After several hours, the DataChain process >> that >> > is >> > > > > reading the input points is still working. >> > > > > >> > > > > I have tried before with way bigger files in the same environment >> > and I >> > > > had >> > > > > no issue. I have already tried: >> > > > > - Check that the process is not locked using all the CPU time. >> > > > > - Format the datanodes. >> > > > > - Compile the last version available on github. >> > > > > - The debug log mode doesn't give any additional information. >> > > > > >> > > > > Could someone give me a hint where to look at that? Thanks for >> your >> > > help! >> > > > > >> > > > > Regards // Saludos // Mit Freundlichen Grüßen // Bien >> cordialement, >> > > > > Pino >> > > > > >> > > > >> > > >> > >> > >