zhilinli123 commented on PR #5660:
URL: https://github.com/apache/hudi/pull/5660#issuecomment-1134746864

   > > 我可以接受此更改以撤消回归。但是,让我们提交一张票来解决根本原因。对于它的价值,我使用 Apache Spark 
在本地测试了一个多写入器场景(一个 deltastreamer,一个数据源),进行了 10 多次提交,并且运行良好。让我们看看它是否真的与 EMRFS 
缓存有关。一旦 CI 是绿色的,我们就可以登陆它。抄送@xushiyan你也可以看看吗?
   > 
   > 
是的,并发错误真的很难测试,我的策略是将补丁发送给我的用户以测试他们的可变用例,我也在等待他们的反馈,一旦他们回应错误从他们的角度消失,我会合并这个补丁。
   > 
   > S3 缓存可能会导致问题,但绝对不是唯一的问题,因为我们的用户使用的是 HDFS。
   
   
   
   > > 我可以接受更改以撤消回归。但是,我们提交这张票来解决这个根本原因。对于它的价值,我使用 Apache Spark 
在本地测试了一个多写入器场景(一个 deltastreamer,一个数据源),进行了 10 次提交,并且运行良好。让我们看看它是否真的与 EMRFS 
有关系。一旦 CI ,我们就可以登陆它抄送。@xushiyan你也可以看看吗?
   > 
   > 是的,如果他们真的测试错误,我的策略会派送他们的用户给我的用例,我的结果是为了回报,他们的测试结果错误会从我的判断合并这个补丁。
   > 
   > S3缓存可能会导致的问题,但绝对不是唯一的问题,因为我们的用户使用HDFS。
   
   
   
   > > I am okay with this change to undo the regression. But let's file a 
ticket to fix the root cause. For what it's worth, I tested a multi-writer 
scenario (one deltastreamer, one datasource) locally with Apache Spark for more 
than 10 commits and it ran fine. Let's see if it's really related to EMRFS 
cache. We can land this once the CI is green. cc @xushiyan could you take a 
look as well?
   > 
   > Yeah, the concurrency bug is really hard to test, my strategy is sending 
the patch to my users to test their variable uses cases, i'm also waiting for 
their feedback, once they response that the bug disappears from their 
perspectives, i would then merge this patch.
   > 
   > S3 cache may cause the problem but it’s definitely not the only one, 
because our user uses HDFS.
   
   I'm testing this right now and if there's a problem it should come back 
tomorrow and I'll get back to the community as soon as possible
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to