[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #495: [ISSUE-476][FEATURE] Respect "spark.shuffle.compress" configuration in Uniffle

GitBox Tue, 17 Jan 2023 02:56:52 -0800


advancedxy commented on code in PR #495:
URL: https://github.com/apache/incubator-uniffle/pull/495#discussion_r1072061228



##########
client-spark/common/src/main/java/org/apache/spark/shuffle/reader/RssShuffleDataIterator.java:
##########
@@ -66,7 +67,8 @@ public RssShuffleDataIterator(
     this.serializerInstance = serializer.newInstance();
     this.shuffleReadClient = shuffleReadClient;
     this.shuffleReadMetrics = shuffleReadMetrics;
-    this.codec = Codec.newInstance(rssConf);
+    boolean compress = 
rssConf.getBoolean(RssClientConfig.SPARK_SHUFFLE_COMPRESS, true);

Review Comment:
   Got it. Of cause, we should detect `spark.shuffle.compress`.
   
   But if we have to redeclare it, the better place should be 
`RssShuffleConf.java`. 
   
   RssClientConfig is shared for multiple engines.



##########
client-spark/common/src/main/java/org/apache/spark/shuffle/reader/RssShuffleDataIterator.java:
##########
@@ -140,6 +125,29 @@ public boolean hasNext() {
     return recordsIterator.hasNext();
   }
 
+  private int uncompress(CompressedShuffleBlock compressedBlock, ByteBuffer 
compressedData) {
+    long compressedDataLength = compressedData.limit() - 
compressedData.position();
+    compressedBytesLength += compressedDataLength;
+    shuffleReadMetrics.incRemoteBytesRead(compressedDataLength);
+
+    int uncompressedLen = compressedBlock.getUncompressLength();
+    if (codec != null) {
+      if (uncompressedData == null || uncompressedData.capacity() < 
uncompressedLen) {
+        // todo: support off-heap bytebuffer
+        uncompressedData = ByteBuffer.allocate(uncompressedLen);
+      }
+      uncompressedData.clear();
+      long startDecompress = System.currentTimeMillis();
+      codec.decompress(compressedData, uncompressedLen, uncompressedData, 0);
+      unCompressedBytesLength += uncompressedLen;
+      long decompressDuration = System.currentTimeMillis() - startDecompress;
+      decompressTime += decompressDuration;
+    } else {
+      uncompressedData = compressedData;

Review Comment:
   Anyway, we may need to update the log info in L134 -136 to reflect the no 
compress case.
   ```
           LOG.info("Fetch " + compressedBytesLength + " bytes cost " + 
readTime + " ms and "
               + serializeTime + " ms to serialize, " + decompressTime + " ms 
to decompress with unCompressionLength["
               + unCompressedBytesLength + "]");
   ```



##########
client-spark/common/src/main/java/org/apache/spark/shuffle/reader/RssShuffleDataIterator.java:
##########
@@ -140,6 +125,29 @@ public boolean hasNext() {
     return recordsIterator.hasNext();
   }
 
+  private int uncompress(CompressedShuffleBlock compressedBlock, ByteBuffer 
compressedData) {
+    long compressedDataLength = compressedData.limit() - 
compressedData.position();
+    compressedBytesLength += compressedDataLength;
+    shuffleReadMetrics.incRemoteBytesRead(compressedDataLength);
+
+    int uncompressedLen = compressedBlock.getUncompressLength();
+    if (codec != null) {
+      if (uncompressedData == null || uncompressedData.capacity() < 
uncompressedLen) {
+        // todo: support off-heap bytebuffer
+        uncompressedData = ByteBuffer.allocate(uncompressedLen);
+      }
+      uncompressedData.clear();
+      long startDecompress = System.currentTimeMillis();
+      codec.decompress(compressedData, uncompressedLen, uncompressedData, 0);
+      unCompressedBytesLength += uncompressedLen;
+      long decompressDuration = System.currentTimeMillis() - startDecompress;
+      decompressTime += decompressDuration;
+    } else {
+      uncompressedData = compressedData;

Review Comment:
   But we are adding up compressedBytes in L130?
   
   Should we be consistent?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-uniffle] advancedxy commented on a diff in pull request #495: [ISSUE-476][FEATURE] Respect "spark.shuffle.compress" configuration in Uniffle

Reply via email to