nsivabalan commented on pull request #2319: URL: https://github.com/apache/hudi/pull/2319#issuecomment-743289804
yeah, [same](https://github.com/apache/hudi/pull/1721) was already brought up before and we didn't proceed since it needed some perf analysis. this was the rational: even though explodeRecordRDDWithFileComparisons(...) is called in two places, in the first place we just do count by key which may not shuffle any actual data, where as the 2nd call could incur shuffling. Hence the pattern of usage differs. More info can be found in the attached PR. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
