This is an automated email from the ASF dual-hosted git repository.

feiwang pushed a commit to branch branch-0.5
in repository https://gitbox.apache.org/repos/asf/celeborn.git


The following commit(s) were added to refs/heads/branch-0.5 by this push:
     new 70f3b3383 [CELEBORN-1533] Log location when 
CelebornInputStream#fillBuffer fails
70f3b3383 is described below

commit 70f3b338387ccec0790582391ddcea15cad40fe6
Author: sychen <[email protected]>
AuthorDate: Thu Aug 1 22:01:27 2024 -0700

    [CELEBORN-1533] Log location when CelebornInputStream#fillBuffer fails
    
    ### What changes were proposed in this pull request?
    This PR aims to  log location when `CelebornInputStream#fillBuffer` fails.
    
    ### Why are the changes needed?
    ```java
    24/07/30 22:29:04,181 [Executor task launch worker for task 353.0 in stage 
10.0 (TID 50641)] ERROR Executor: Exception in task 353.0 in stage 10.0 (TID 
50641)
    com.github.luben.zstd.ZstdException: Data corruption detected
            at 
com.github.luben.zstd.ZstdDecompressCtx.decompressByteArray(ZstdDecompressCtx.java:205)
            at com.github.luben.zstd.Zstd.decompressByteArray(Zstd.java:439)
            at 
org.apache.celeborn.client.compress.ZstdDecompressor.decompress(ZstdDecompressor.java:54)
            at 
org.apache.celeborn.client.read.CelebornInputStream$CelebornInputStreamImpl.fillBuffer(CelebornInputStream.java:563)
            at 
org.apache.celeborn.client.read.CelebornInputStream$CelebornInputStreamImpl.read(CelebornInputStream.java:464)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
            at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
            at java.io.DataInputStream.read(DataInputStream.java:149)
            at org.sparkproject.guava.io.ByteStreams.read(ByteStreams.java:899)
            at 
org.sparkproject.guava.io.ByteStreams.readFully(ByteStreams.java:733)
            at 
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:127)
            at 
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110)
            at scala.collection.Iterator$$anon$11.next(Iterator.scala:496)
            at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
            at 
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
            at 
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
            at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
            at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage6.sort_addToSorter_0$(Unknown
 Source)
            at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage6.processNext(Unknown
 Source)
            at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
            at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
            at 
org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:82)
            at 
org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:1065)
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    GA
    
    Closes #2655 from cxzl25/CELEBORN-1533.
    
    Lead-authored-by: sychen <[email protected]>
    Co-authored-by: cxzl25 <[email protected]>
    Signed-off-by: Wang, Fei <[email protected]>
    (cherry picked from commit d0ef714a4512fec0d6f11da36d572188d2bd6a40)
    Signed-off-by: Wang, Fei <[email protected]>
---
 .../apache/celeborn/client/read/CelebornInputStream.java | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git 
a/client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java 
b/client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java
index f6d7ba80d..cd1e5d061 100644
--- 
a/client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java
+++ 
b/client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java
@@ -665,6 +665,13 @@ public abstract class CelebornInputStream extends 
InputStream {
 
         return hasData;
       } catch (IOException e) {
+        logger.error(
+            "Failed to fill buffer from chunk. AppShuffleId {}, shuffleId {}, 
partitionId {}, location {}",
+            appShuffleId,
+            shuffleId,
+            partitionId,
+            currentReader.getLocation(),
+            e);
         IOException ioe = e;
         if (exceptionMaker != null) {
           if (shuffleClient.reportShuffleFetchFailure(appShuffleId, 
shuffleId)) {
@@ -680,6 +687,15 @@ public abstract class CelebornInputStream extends 
InputStream {
           }
         }
         throw ioe;
+      } catch (Exception e) {
+        logger.error(
+            "Failed to read data from chunk. AppShuffleId {}, shuffleId {}, 
partitionId {}, location {}",
+            appShuffleId,
+            shuffleId,
+            partitionId,
+            currentReader.getLocation(),
+            e);
+        throw e;
       }
     }
 

Reply via email to