zzcclp commented on issue #4943:
URL: 
https://github.com/apache/incubator-gluten/issues/4943#issuecomment-2003077901

   > 原因:查询运行过程中,有26200次new byte[1024*1024] 操作,平均每个task有78次,总耗时8s, 而查询耗时也就30+s
   > 
   > 
问题:为什么会走带copy的OnHeapCopyShuffleInputStream,没走zero-copy的LowCopyNettyShuffleInputStream
   > 
   > 调用链
   > 
   > ```
   > CHColumnarBatchSerializerInstance.deserializeStream
   > CHStreamReader.CHStreamReader
   > CHShuffleReadStreamFactory.create
   > ```
   > 
   > ```java
   > public static ShuffleInputStream create(
   >       InputStream in, boolean forceCompress, boolean 
isCustomizedShuffleCodec) {
   >     final InputStream unwrapped = unwrapInputStream(in, forceCompress, 
isCustomizedShuffleCodec);
   >     if (unwrapped != null) {
   >       return createCompressedShuffleInputStream(in, unwrapped);
   >     }
   >     return new OnHeapCopyShuffleInputStream(in, false);
   >   }
   > 
   >   private static InputStream unwrapInputStream(
   >       InputStream in, boolean forceCompress, boolean 
isCustomizedShuffleCodec) {
   >     if (forceCompress) {
   >       return unwrapSparkInputStream(in);
   >     } else if (isCustomizedShuffleCodec) {
   >       return unwrapSparkWithCompressedInputStream(in);
   >     }
   >     return null;
   >   }
   > ```
   > 
   > 由于我的local环境中并未设置celeborn作为shuffle manager, 
因此最终走了OnHeapCopyShuffleInputStream。而OnHeapCopyShuffleInputStream目前的实现还不是很高效,最终导致了标题中描述的问题。
   
   这里可能要看下你本地调用连,理应要走  LowCopyFileSegmentShuffleInputStream 
这个,因为是从本地文件直接读取,按理走这里。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to