alamb opened a new pull request, #741:
URL: https://github.com/apache/arrow-site/pull/741

   - Closes  https://github.com/apache/arrow-rs/issues/8035
   
   
   
   Initial draft was created using codex with the following prompt in case 
anyone is interested
   
   <details><summary>Details</summary>
   <p>
   
   ```text
   Please write a technical blog post about the new `ParquetPushDecoder` titled
   
   "Push Decoder: Fine-Grained Control over IO and CPU when Reading Parquet 
Files"
   
   It should have a publish date of December 17, 2025
   
   It should have the same writing style and high level formatting as 
_posts/2025-10-23-rust-parquet-metadata.md
   
   The blog post will be about the push parquet decoder, and how it can be used 
to offer more fine grained control over IO and CPU work in the parquet reader.
   
   The blog post would cover:
   
   * Motivation: why do we need a push decoder?
   * Design: how does the push decoder work?
   * Examples: how to use the push decoder in practice
   * Performance: how does the push decoder perform compared to the existing 
parquet reader?
   * Future work: what are the next steps for the push decoder?
   
   
   In the background section be sure to mention
   * arrow-rs already has push decoders for csv and json (include links to 
their documentation)
   * we needed two distinct decoders for parquet already, sync and `async` 
which led to code duplication
   * Hard to integrate (and needed "first party" support for object_store, but 
why not for other IO sources like OpenDAL?)
   
   The motivation section should mention:
   * how would we support more fine grained pre-fetching (we can prefetch with 
row groups now)
   * This is the https://sans-io.readthedocs.io/ applied to columnar file 
formats
   * Include a diagram showing the control flow for a standard "pull" deocder:
   ** the request for the next batch of data eventually results in an IO 
request issued by the decoder itself
   * Include a diagram for how push decoders work
   
   The examples section should include
   * examples from the documentation 
https://docs.rs/parquet/latest/parquet/arrow/push_decoder/struct.ParquetPushDecoder.html
   * you can also find the examples here: 
https://github.com/apache/arrow-rs/blob/main/parquet/src/arrow/push_decoder/mod.rs
   
   Please include details from the following github tickets:
   *  https://github.com/apache/arrow-rs/issues/8035
   * Use the background, motivation, and high level design description on this 
Github ticket: https://github.com/apache/arrow-rs/issues/7983
   ```
   
   </p>
   </details> 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to