+1 worthwhile to lower Spark small-data overhead On Mon, May 4, 2026 at 11:47 PM Ángel Álvarez Pascua < [email protected]> wrote:
> Love it. Please, count on me if any help is needed. > > El mar, 5 may 2026, 7:31, DB Tsai <[email protected]> escribió: > >> Thanks Daniel and Liang-Chi for driving this. This is an exciting >> proposal that can significantly speed up local experimentation and >> development on laptops. It also helps make Spark a great fit for both >> big-data workloads and small-data exploratory workflows. >> >> DB Tsai | https://www.dbtsai.com/ | PGP 0x9FB9FAA3 >> >> On Monday, May 4th, 2026 at 3:39 PM, Daniel Tenedorio < >> [email protected]> wrote: >> >> Hi Spark community, >> >> We’d like to propose a new SPIP to improve the experience of running >> Apache Spark on laptops. >> >> SPIP doc: >> >> >> https://docs.google.com/document/d/1Nphejrf_vh4YRECn0JPgKClqxDS_lB6wufZFJQxyY98/edit?tab=t.0#heading=h.hj76akdx5ul >> >> Summary: >> >> Spark’s execution model is optimized for distributed workloads, but this >> introduces noticeable overhead for small datasets (e.g., <100MB), where >> even simple queries can take multiple seconds. This makes Spark less >> suitable for interactive and exploratory use cases on laptops, and often >> pushes users toward alternative single-node tools. >> >> This proposal aims to reduce that overhead in local mode, improving >> latency for small queries and making Spark more usable as an entry point >> for new users and iterative workflows. >> >> We’d appreciate your review and feedback. >> >> Thanks, >> Daniel Tenedorio and Liang-Chi Hsieh >> >> >> -- John Zhuge
