----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13909/#review25811 -----------------------------------------------------------
Ship it! +1, awesome work that will help our super nodes. giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java <https://reviews.apache.org/r/13909/#comment50345> "goes over allowed size of an array" -> "goes beyond the maximum size of a byte array" setting this option will remove that limit. The maximum memory available for a single vertex will be limited to the maximum heap size available. giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java <https://reviews.apache.org/r/13909/#comment50343> "of an array" -> "of a byte array" giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataInput.java <https://reviews.apache.org/r/13909/#comment50346> If you provide a method to get the number of buffers, you can allocate the array to the exact size. giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataInput.java <https://reviews.apache.org/r/13909/#comment50347> This is an assumption right? If skipBytes returns something other than bytesLeftToSkip it would be wrong. giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataOutput.java <https://reviews.apache.org/r/13909/#comment50348> 2^31 bytes right? giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataOutput.java <https://reviews.apache.org/r/13909/#comment50344> Should this be bigger than 32 MB? If we are hitting the 2 GB barrier, then we will have 64 buffers just to get to 2 GB. Maybe 64 MB? Would this help reduce the overhead? - Avery Ching On Aug. 30, 2013, 3:19 a.m., Maja Kabiljo wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/13909/ > ----------------------------------------------------------- > > (Updated Aug. 30, 2013, 3:19 a.m.) > > > Review request for giraph. > > > Bugs: GIRAPH-752 > https://issues.apache.org/jira/browse/GIRAPH-752 > > > Repository: giraph-git > > > Description > ------- > > We've seen before that we crash when we have a vertex which receives a lot of > messages and we don't use a combiner. That is because the total size of > serialized messages for that vertex is bigger than the allowed size of an > array. > We should implement OutputStream which can handle arbitrary size of data and > add an option to use that kind of stream for messages. > > > Diffs > ----- > > > giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java > 6518da6 > > giraph-core/src/main/java/org/apache/giraph/comm/messages/MessagesIterable.java > a466a8d > > giraph-core/src/main/java/org/apache/giraph/comm/messages/out_of_core/PartitionDiskBackedMessageStore.java > 7b3e548 > > giraph-core/src/main/java/org/apache/giraph/comm/messages/out_of_core/SequentialFileMessageStore.java > 64031c3 > > giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/IntByteArrayMessageStore.java > 597e7af > > giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/LongByteArrayMessageStore.java > 3fe6356 > giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java > 604729a > > giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java > 2506c21 > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayIterable.java > cf2c187 > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayIterator.java > 76ed789 > > giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java > 56cc01c > giraph-core/src/main/java/org/apache/giraph/utils/Factory.java PRE-CREATION > > giraph-core/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterable.java > e3992ed > > giraph-core/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterator.java > b6151c5 > giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataInput.java > PRE-CREATION > > giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataInputOutput.java > PRE-CREATION > giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataOutput.java > PRE-CREATION > giraph-core/src/main/java/org/apache/giraph/utils/io/DataInputOutput.java > PRE-CREATION > > giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java > PRE-CREATION > giraph-core/src/main/java/org/apache/giraph/utils/io/package-info.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/13909/diff/ > > > Testing > ------- > > Run a job which fails with original code and when the new option is not used, > and verified it works properly when the job is used. > Also compared the performance with and without the change, it's the same, > when option is turned on it seems to add about 5% overhead. > mvn clean verify > > > Thanks, > > Maja Kabiljo > >
