dingshun3016 commented on PR #1636:
URL: 
https://github.com/apache/incubator-uniffle/pull/1636#issuecomment-2047151541

   > I don't see the root cause of this bug. @dingshun3016 Could you help add 
test case to simulate this or add more description
   
   I post one case log in our production env.
   **First** 
   Init LocalFileServerReadHandler, but no index files and data files  in 
`application_1711720273776_2156538_-779157491-37794/0/1176-1176`
   > `[INFO] 2024-04-02 14:23:13,880 Grpc-116 LocalFileServerReadHandler 
prepareFilePath - index files not find, baseFolder is 
/home/vipshop/hard_disk/0/uniffle_data/application_1711720273776_2156538_-779157491-37794/0/1176-1176,
 appId application_1711720273776_2156538_-779157491-37794 shuffleId 0 
partitionId 1176 partitionNumPerRange 1 partitionNum 2048 storageBasePath 
/home/vipshop/hard_disk/0/uniffle_data`
   
   > `[INFO] 2024-04-02 14:23:13,880 Grpc-116 ShuffleServerGrpcService 
getLocalShuffleIndex - Successfully getShuffleIndex cost 0 ms for 0 bytes with 
appId[application_1711720273776_2156538_-779157491-37794], shuffleId[0], 
partitionId[1176]`
   
   **Second**
   Shuffer server flush event from memory to local file
   > `[DEBUG] 2024-04-02 14:25:51,696 LocalFileFlushEventThreadPool-42 
ShuffleFlushManager processEvent - Flush to file success in 22 ms and release 
6765262 bytes event ShuffleDataFlushEvent: eventId=31941657, 
appId=application_1711720273776_2156538_-779157491-37794, shuffleId=0, 
startPartition=1176, endPartition=1176, retryTimes=0, 
underStorage=LocalStorage, isPended=false`
   
   **Third**
   Because LocalFileServerReadHandler has been initialized in the first step 
and is same, so index file and data file is still empty, but in fact there are 
not empty in this directory at this time.
   > `[INFO] 2024-04-02 14:26:24,636 Grpc-13 ShuffleServerGrpcService 
getLocalShuffleIndex - Successfully getShuffleIndex cost 0 ms for 0 bytes with 
appId[application_1711720273776_2156538_-779157491-37794], shuffleId[0], 
partitionId[1176]`
   
   **App log**
   > `24/04/02 14:26:27 ERROR [Executor task launch worker for task 2507] 
Executor: Exception in task 1176.3 in stage 5.0 (TID 2507)
   org.apache.uniffle.common.exception.RssException: Blocks read inconsistent: 
expected 506 blocks, actual 459 blocks
   at 
org.apache.uniffle.common.util.RssUtils.checkProcessedBlockIds(RssUtils.java:375)
   at 
org.apache.uniffle.client.impl.ShuffleReadClientImpl.checkProcessedBlockIds(ShuffleReadClientImpl.java:279)
   at 
org.apache.spark.shuffle.reader.RssShuffleDataIterator.hasNext(RssShuffleDataIterator.java:131)
   at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
   at 
org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.hasNext(RssShuffleReader.java:297)
   at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
   at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.sort_addToSorter_0$(Unknown
 Source)
   at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown
 Source)
   at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:47)
   at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
   at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage6.findNextInnerJoinRows$(Unknown
 Source)
   at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage6.processNext(Unknown
 Source)
   at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:47)
   at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:748)
   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to