> On Aug. 30, 2013, 11:33 p.m., Avery Ching wrote: > > giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataOutput.java, > > line 45 > > <https://reviews.apache.org/r/13909/diff/1/?file=346572#file346572line45> > > > > Should this be bigger than 32 MB? If we are hitting the 2 GB barrier, > > then we will have 64 buffers just to get to 2 GB. Maybe 64 MB? Would this > > help reduce the overhead? > > Maja Kabiljo wrote: > I don't believe that having that few buffers comparing to their size can > add any visible overhead. I think that the overhead comes because we have to > do the checks all the time. With one application which is using a lot of > memory I tried 256MB chunks and it was crashing, while 32MB run fine.
Sounds good. - Avery ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13909/#review25811 ----------------------------------------------------------- On Sept. 2, 2013, 6:03 p.m., Maja Kabiljo wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/13909/ > ----------------------------------------------------------- > > (Updated Sept. 2, 2013, 6:03 p.m.) > > > Review request for giraph. > > > Bugs: GIRAPH-752 > https://issues.apache.org/jira/browse/GIRAPH-752 > > > Repository: giraph-git > > > Description > ------- > > We've seen before that we crash when we have a vertex which receives a lot of > messages and we don't use a combiner. That is because the total size of > serialized messages for that vertex is bigger than the allowed size of an > array. > We should implement OutputStream which can handle arbitrary size of data and > add an option to use that kind of stream for messages. > > > Diffs > ----- > > > giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java > 6518da6 > > giraph-core/src/main/java/org/apache/giraph/comm/messages/MessagesIterable.java > a466a8d > > giraph-core/src/main/java/org/apache/giraph/comm/messages/out_of_core/PartitionDiskBackedMessageStore.java > 7b3e548 > > giraph-core/src/main/java/org/apache/giraph/comm/messages/out_of_core/SequentialFileMessageStore.java > 64031c3 > > giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/IntByteArrayMessageStore.java > 597e7af > > giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/LongByteArrayMessageStore.java > 3fe6356 > giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java > 604729a > > giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java > 2506c21 > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayIterable.java > cf2c187 > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayIterator.java > 76ed789 > > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java > 56cc01c > giraph-core/src/main/java/org/apache/giraph/utils/Factory.java PRE-CREATION > > giraph-core/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterable.java > e3992ed > > giraph-core/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterator.java > b6151c5 > giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataInput.java > PRE-CREATION > > giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataInputOutput.java > PRE-CREATION > giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataOutput.java > PRE-CREATION > giraph-core/src/main/java/org/apache/giraph/utils/io/DataInputOutput.java > PRE-CREATION > > giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java > PRE-CREATION > giraph-core/src/main/java/org/apache/giraph/utils/io/package-info.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/13909/diff/ > > > Testing > ------- > > Run a job which fails with original code and when the new option is not used, > and verified it works properly when the job is used. > Also compared the performance with and without the change, it's the same, > when option is turned on it seems to add about 5% overhead. > mvn clean verify > > > Thanks, > > Maja Kabiljo > >
