Thank you Mich. I addressed your point on the SPIP doc.

Kazu

> On Feb 1, 2023, at 2:04 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> 
> 
> In your statement on Q2 in SPIP, you mention and I quote
> 
> "... File formats other than Parquet are beyond the scope of this SPIP.."
> 
> It is important that you explain why you choose Parquet for this work. Apache 
> Parquet  <https://parquet.apache.org/>is an open source column-oriented data 
> format that is widely used in the Apache Hadoop ecosystem and beyond. It is 
> designed for efficient data storage and retrieval. Many data warehouses 
> prefer to store data in external storage in Parquet format. As an ETL 
> workload for Spark, it makes sense to optimise data retrieval as much as 
> possible.
> 
> HTH
> 
>    view my Linkedin profile 
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> 
>  https://en.everybodywiki.com/Mich_Talebzadeh 
> <https://en.everybodywiki.com/Mich_Talebzadeh>
>  
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> 
> On Tue, 31 Jan 2023 at 17:35, kazuyuki tanimura <ktanim...@apple.com.invalid> 
> wrote:
> Hi everyone,
> 
> I would like to start a discussion on “Lazy Materialization for Parquet Read 
> Performance Improvement"
> 
> Chao and I propose a Parquet reader with lazy materialization. For Spark-SQL 
> filter operations, evaluating the filters first and lazily materializing only 
> the used values can save computation wastes and improve the read performance.
> The current implementation of Spark requires the read values to materialize 
> (i.e. decompress, de-code, etc...) onto memory first before applying the 
> filters even though the filters may eventually throw away many values.
> 
> We made our design doc as follows.
> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-42256 
> <https://issues.apache.org/jira/browse/SPARK-42256> 
> SPIP Doc: 
> https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME
>  
> <https://docs.google.com/document/d/1Kr3y2fVZUbQXGH0y8AvdCAeWC49QJjpczapiaDvFzME>
> 
> Liang-Chi was kind enough to shepherd this effort. 
> 
> Thank you
> Kazu

Reply via email to