Greetings fellow practitioners of the art, We believe we are ready to call a vote on the *Faster queries in local laptop mode for Apache Spark*.
Motivation: We want to enhance Spark's usability and interactivity for small-data queries, specifically on laptops. This can make it more useful for individual users and beginners prototyping. Proposal: This SPIP includes three specific categories of performance improvements, including optimization improvements for single-file scans, an Arrow-based df.cache reimplementation, and shuffle-free local execution for small queries. The community has also suggested a couple other ideas in a similar spirit on the document, and a couple members have volunteered to help with the implementation. SPIP Document: https://docs.google.com/document/d/1Nphejrf_vh4YRECn0JPgKClqxDS_lB6wufZFJQxyY98/edit?tab=t.0#heading=h.hj76akdx5ul The vote will be open for at least 72 hours, and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. Please vote: [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don't think this is a good idea because ... Best, Daniel Tenedorio and Liang-Chi Hsieh
