Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

kazuyuki tanimura Thu, 02 Feb 2023 10:11:48 -0800

Thank you all for +1s and reviewing the SPIP doc.

Kazu


> On Feb 1, 2023, at 1:28 AM, Dongjoon Hyun <[email protected]> wrote:
> 
> +1
> 
> On Wed, Feb 1, 2023 at 12:52 AM Mich Talebzadeh <[email protected] 
> <mailto:[email protected]>> wrote:
> +1
> 
> 
>    view my Linkedin profile 
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> 
>  https://en.everybodywiki.com/Mich_Talebzadeh 
> <https://en.everybodywiki.com/Mich_Talebzadeh>
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> 
> On Wed, 1 Feb 2023 at 02:23, huaxin gao <[email protected] 
> <mailto:[email protected]>> wrote:
> +1
> 
> On Tue, Jan 31, 2023 at 6:10 PM DB Tsai <[email protected] 
> <mailto:[email protected]>> wrote:
> +1
> 
> Sent from my iPhone
> 
>> On Jan 31, 2023, at 4:16 PM, Yuming Wang <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> +1.
>> 
>> On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura 
>> <[email protected]> wrote:
>> Great! Much appreciated, Mitch!
>> 
>> Kazu
>> 
>>> On Jan 31, 2023, at 3:07 PM, Mich Talebzadeh <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Thanks, Kazu.
>>> 
>>> I followed that template link and indeed as you pointed out it is a common 
>>> template. If it works then it is what it is.
>>> 
>>> I will be going through your design proposals and hopefully we can review 
>>> it.
>>> 
>>> Regards,
>>> 
>>> Mich
>>> 
>>> 
>>>    view my Linkedin profile 
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>> 
>>>  https://en.everybodywiki.com/Mich_Talebzadeh 
>>> <https://en.everybodywiki.com/Mich_Talebzadeh>
>>>  
>>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>>> loss, damage or destruction of data or any other property which may arise 
>>> from relying on this email's technical content is explicitly disclaimed. 
>>> The author will in no case be liable for any monetary damages arising from 
>>> such loss, damage or destruction.
>>>  
>>> 
>>> 
>>> On Tue, 31 Jan 2023 at 22:34, kazuyuki tanimura <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Thank you Mich. I followed the instruction at 
>>> https://spark.apache.org/improvement-proposals.html 
>>> <https://spark.apache.org/improvement-proposals.html> and used its template.
>>> While we are open to revise our design doc, it seems more like you are 
>>> proposing the community to change the instruction per se?
>>> 
>>> Kazu
>>> 
>>>> On Jan 31, 2023, at 11:24 AM, Mich Talebzadeh <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Thanks for these proposals. good suggestions. Is this style of breaking 
>>>> down your approach standard?
>>>> 
>>>> My view would be that perhaps it makes more sense to follow the industry 
>>>> established approach of breaking down your technical proposal  into:
>>>> 
>>>> Background
>>>> Objective
>>>> Scope
>>>> Constraints
>>>> Assumptions
>>>> Reporting
>>>> Deliverables
>>>> Timelines
>>>> Appendix
>>>> Your current approach using below 
>>>> 
>>>> Q1. What are you trying to do? Articulate your objectives using absolutely 
>>>> no jargon. What are you trying to achieve?
>>>> Q2. What problem is this proposal NOT designed to solve? What issues the 
>>>> suggested proposal is not going to address
>>>> Q3. How is it done today, and what are the limits of current practice?
>>>> Q4. What is new in your approach approach and why do you think it will be 
>>>> successful succeed?
>>>> Q5. Who cares? If you are successful, what difference will it make? If 
>>>> your proposal succeeds, what tangible benefits will it add?
>>>> Q6. What are the risks?
>>>> Q7. How long will it take?
>>>> Q8. What are the midterm and final “exams” to check for success?
>>>>  
>>>> May not do  justice to your proposal.
>>>> 
>>>> HTH
>>>> 
>>>> Mich
>>>> 
>>>> 
>>>>    view my Linkedin profile 
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>> 
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh 
>>>> <https://en.everybodywiki.com/Mich_Talebzadeh>
>>>>  
>>>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>>>> loss, damage or destruction of data or any other property which may arise 
>>>> from relying on this email's technical content is explicitly disclaimed. 
>>>> The author will in no case be liable for any monetary damages arising from 
>>>> such loss, damage or destruction.
>>>>  
>>>> 
>>>> 
>>>> On Tue, 31 Jan 2023 at 17:35, kazuyuki tanimura 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>> Hi everyone,
>>>> 
>>>> I would like to start a discussion on “Lazy Materialization for Parquet 
>>>> Read Performance Improvement"
>>>> 
>>>> Chao and I propose a Parquet reader with lazy materialization. For 
>>>> Spark-SQL filter operations, evaluating the filters first and lazily 
>>>> materializing only the used values can save computation wastes and improve 
>>>> the read performance.
>>>> The current implementation of Spark requires the read values to 
>>>> materialize (i.e. decompress, de-code, etc...) onto memory first before 
>>>> applying the filters even though the filters may eventually throw away 
>>>> many values.
>>>> 
>>>> We made our design doc as follows.
>>>> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-42256 
>>>> <https://issues.apache.org/jira/browse/SPARK-42256> 
>>>> SPIP Doc: 
>>>> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
>>>>  
>>>> <https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME>
>>>> 
>>>> Liang-Chi was kind enough to shepherd this effort. 
>>>> 
>>>> Thank you
>>>> Kazu
>>> 
>>

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

Reply via email to