+1 worthwhile to lower Spark small-data overhead

On Mon, May 4, 2026 at 11:47 PM Ángel Álvarez Pascua <
[email protected]> wrote:

> Love it. Please, count on me if any help is needed.
>
> El mar, 5 may 2026, 7:31, DB Tsai <[email protected]> escribió:
>
>> Thanks Daniel and Liang-Chi for driving this. This is an exciting
>> proposal that can significantly speed up local experimentation and
>> development on laptops. It also helps make Spark a great fit for both
>> big-data workloads and small-data exploratory workflows.
>>
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 0x9FB9FAA3
>>
>> On Monday, May 4th, 2026 at 3:39 PM, Daniel Tenedorio <
>> [email protected]> wrote:
>>
>> Hi Spark community,
>>
>> We’d like to propose a new SPIP to improve the experience of running
>> Apache Spark on laptops.
>>
>> SPIP doc:
>>
>>
>> https://docs.google.com/document/d/1Nphejrf_vh4YRECn0JPgKClqxDS_lB6wufZFJQxyY98/edit?tab=t.0#heading=h.hj76akdx5ul
>>
>> Summary:
>>
>> Spark’s execution model is optimized for distributed workloads, but this
>> introduces noticeable overhead for small datasets (e.g., <100MB), where
>> even simple queries can take multiple seconds. This makes Spark less
>> suitable for interactive and exploratory use cases on laptops, and often
>> pushes users toward alternative single-node tools.
>>
>> This proposal aims to reduce that overhead in local mode, improving
>> latency for small queries and making Spark more usable as an entry point
>> for new users and iterative workflows.
>>
>> We’d appreciate your review and feedback.
>>
>> Thanks,
>> Daniel Tenedorio and Liang-Chi Hsieh
>>
>>
>>

-- 
John Zhuge

Reply via email to