Interesting ________________________________ From: Cheng Pan <[email protected]> Sent: Wednesday, May 6, 2026 8:01 PM To: [email protected] <[email protected]> Subject: Re: [DISCUSS] SPIP: Faster queries in local laptop mode for Apache Spark
+1. And I leave a comment in the docs about the Hadoop client improvement, which should also benefit running Spark on the laptop. Thanks, Cheng Pan On May 6, 2026, at 15:01, John Zhuge <[email protected]> wrote: +1 worthwhile to lower Spark small-data overhead On Mon, May 4, 2026 at 11:47 PM Ángel Álvarez Pascua <[email protected]<mailto:[email protected]>> wrote: Love it. Please, count on me if any help is needed. El mar, 5 may 2026, 7:31, DB Tsai <[email protected]<mailto:[email protected]>> escribió: Thanks Daniel and Liang-Chi for driving this. This is an exciting proposal that can significantly speed up local experimentation and development on laptops. It also helps make Spark a great fit for both big-data workloads and small-data exploratory workflows. DB Tsai | https://www.dbtsai.com/ | PGP 0x9FB9FAA3 On Monday, May 4th, 2026 at 3:39 PM, Daniel Tenedorio <[email protected]<mailto:[email protected]>> wrote: Hi Spark community, We’d like to propose a new SPIP to improve the experience of running Apache Spark on laptops. SPIP doc: https://docs.google.com/document/d/1Nphejrf_vh4YRECn0JPgKClqxDS_lB6wufZFJQxyY98/edit?tab=t.0#heading=h.hj76akdx5ul Summary: Spark’s execution model is optimized for distributed workloads, but this introduces noticeable overhead for small datasets (e.g., <100MB), where even simple queries can take multiple seconds. This makes Spark less suitable for interactive and exploratory use cases on laptops, and often pushes users toward alternative single-node tools. This proposal aims to reduce that overhead in local mode, improving latency for small queries and making Spark more usable as an entry point for new users and iterative workflows. We’d appreciate your review and feedback. Thanks, Daniel Tenedorio and Liang-Chi Hsieh -- John Zhuge
