Re: Spark Push-Based Shuffle causing multiple stage failures

2022-05-28 Thread Ye Zhou
Hi, Han. The configuration for External Shuffle Service(ESS) in YARN has to be configured in yarn-site.xml for NodeManagers, as it is an auxiliary service in NodeManager. We will try to improve the documentation for enabling push based shuffle. Thanks for the feedback. For the straggler issue, is

k-anonymity with Spark in Java

2022-05-28 Thread marc nicole
Hi Spark devs, Anybody willing to check my code implementing *k-anonymity*? public static Dataset < Row > kAnonymizeBySuppression(SparkSession sparksession, Dataset < Row > initDataset, List < String > qidAtts, Integer k_anonymity_constant) { Dataset < Row > anonymizedDF =