Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

kazuyuki tanimura Mon, 13 Feb 2023 12:42:06 -0800

Thank you Liang-Chi!

Kazu


> On Feb 11, 2023, at 7:12 PM, L. C. Hsieh <[email protected]> wrote:
> 
> Thanks all for your feedback.
> 
> Given this positive feedback, if there is no other comments/discussion, I 
> will go to start a vote in the next few days.
> 
> Thank you again!
> 
> On Thu, Feb 2, 2023 at 10:12 AM kazuyuki tanimura 
> <[email protected]> wrote:
> Thank you all for +1s and reviewing the SPIP doc.
> 
> Kazu
> 
>> On Feb 1, 2023, at 1:28 AM, Dongjoon Hyun <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> +1
>> 
>> On Wed, Feb 1, 2023 at 12:52 AM Mich Talebzadeh <[email protected] 
>> <mailto:[email protected]>> wrote:
>> +1
>> 
>> 
>>    view my Linkedin profile 
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>> 
>>  https://en.everybodywiki.com/Mich_Talebzadeh 
>> <https://en.everybodywiki.com/Mich_Talebzadeh>
>>  
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>> 
>> On Wed, 1 Feb 2023 at 02:23, huaxin gao <[email protected] 
>> <mailto:[email protected]>> wrote:
>> +1
>> 
>> On Tue, Jan 31, 2023 at 6:10 PM DB Tsai <[email protected] 
>> <mailto:[email protected]>> wrote:
>> +1
>> 
>> Sent from my iPhone
>> 
>>> On Jan 31, 2023, at 4:16 PM, Yuming Wang <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>> +1.
>>> 
>>> On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> Great! Much appreciated, Mitch!
>>> 
>>> Kazu
>>> 
>>>> On Jan 31, 2023, at 3:07 PM, Mich Talebzadeh <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Thanks, Kazu.
>>>> 
>>>> I followed that template link and indeed as you pointed out it is a common 
>>>> template. If it works then it is what it is.
>>>> 
>>>> I will be going through your design proposals and hopefully we can review 
>>>> it.
>>>> 
>>>> Regards,
>>>> 
>>>> Mich
>>>> 
>>>> 
>>>>    view my Linkedin profile 
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>> 
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh 
>>>> <https://en.everybodywiki.com/Mich_Talebzadeh>
>>>>  
>>>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>>>> loss, damage or destruction of data or any other property which may arise 
>>>> from relying on this email's technical content is explicitly disclaimed. 
>>>> The author will in no case be liable for any monetary damages arising from 
>>>> such loss, damage or destruction.
>>>>  
>>>> 
>>>> 
>>>> On Tue, 31 Jan 2023 at 22:34, kazuyuki tanimura <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Thank you Mich. I followed the instruction at 
>>>> https://spark.apache.org/improvement-proposals.html 
>>>> <https://spark.apache.org/improvement-proposals.html> and used its 
>>>> template.
>>>> While we are open to revise our design doc, it seems more like you are 
>>>> proposing the community to change the instruction per se?
>>>> 
>>>> Kazu
>>>> 
>>>>> On Jan 31, 2023, at 11:24 AM, Mich Talebzadeh <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Thanks for these proposals. good suggestions. Is this style of breaking 
>>>>> down your approach standard?
>>>>> 
>>>>> My view would be that perhaps it makes more sense to follow the industry 
>>>>> established approach of breaking down your technical proposal  into:
>>>>> 
>>>>> Background
>>>>> Objective
>>>>> Scope
>>>>> Constraints
>>>>> Assumptions
>>>>> Reporting
>>>>> Deliverables
>>>>> Timelines
>>>>> Appendix
>>>>> Your current approach using below 
>>>>> 
>>>>> Q1. What are you trying to do? Articulate your objectives using 
>>>>> absolutely no jargon. What are you trying to achieve?
>>>>> Q2. What problem is this proposal NOT designed to solve? What issues the 
>>>>> suggested proposal is not going to address
>>>>> Q3. How is it done today, and what are the limits of current practice?
>>>>> Q4. What is new in your approach approach and why do you think it will be 
>>>>> successful succeed?
>>>>> Q5. Who cares? If you are successful, what difference will it make? If 
>>>>> your proposal succeeds, what tangible benefits will it add?
>>>>> Q6. What are the risks?
>>>>> Q7. How long will it take?
>>>>> Q8. What are the midterm and final “exams” to check for success?
>>>>>  
>>>>> May not do  justice to your proposal.
>>>>> 
>>>>> HTH
>>>>> 
>>>>> Mich
>>>>> 
>>>>> 
>>>>>    view my Linkedin profile 
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>> 
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh 
>>>>> <https://en.everybodywiki.com/Mich_Talebzadeh>
>>>>>  
>>>>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>>>>> loss, damage or destruction of data or any other property which may arise 
>>>>> from relying on this email's technical content is explicitly disclaimed. 
>>>>> The author will in no case be liable for any monetary damages arising 
>>>>> from such loss, damage or destruction.
>>>>>  
>>>>> 
>>>>> 
>>>>> On Tue, 31 Jan 2023 at 17:35, kazuyuki tanimura 
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>> Hi everyone,
>>>>> 
>>>>> I would like to start a discussion on “Lazy Materialization for Parquet 
>>>>> Read Performance Improvement"
>>>>> 
>>>>> Chao and I propose a Parquet reader with lazy materialization. For 
>>>>> Spark-SQL filter operations, evaluating the filters first and lazily 
>>>>> materializing only the used values can save computation wastes and 
>>>>> improve the read performance.
>>>>> The current implementation of Spark requires the read values to 
>>>>> materialize (i.e. decompress, de-code, etc...) onto memory first before 
>>>>> applying the filters even though the filters may eventually throw away 
>>>>> many values.
>>>>> 
>>>>> We made our design doc as follows.
>>>>> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-42256 
>>>>> <https://issues.apache.org/jira/browse/SPARK-42256> 
>>>>> SPIP Doc: 
>>>>> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
>>>>>  
>>>>> <https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME>
>>>>> 
>>>>> Liang-Chi was kind enough to shepherd this effort. 
>>>>> 
>>>>> Thank you
>>>>> Kazu
>>>> 
>>> 
>

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

Reply via email to