[jira] [Updated] (ARROW-10135) [Rust] [Parquet] Refactor file module to help adding sources

Remi Dettai (Jira) Tue, 29 Sep 2020 08:00:09 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Remi Dettai updated ARROW-10135:
--------------------------------
    Description: 
Currently, the Parquet reader is very strongly tied to file system reads. This 
makes it hard to add other sources. For instance, to implement S3, we would 
need a reader that loads entire columns at once rather than buffered reads of a 
few Ko.

To improve modularity, we could try to move as much logic as possible to the 
generic traits (FileReader, RowGroupReader...) and reduce the content of the 
implementing structs (SerializedFileReader, SerializedRowGroupReader...) to the 
part that is specific to file/buffered reads.

  was:
Currently, the Parquet reader is very strongly tied to file system reads. This 
makes it hard to add other sources. For instance, to implement S3, we would 
need the a reader that loads entire columns at once rather than buffered reads 
of a few Ko.

To improve modularity, we could try to move as much logic as possible to the 
generic traits (FileReader, RowGroupReader...) and reduce the content of the 
implementing structs (SerializedFileReader, SerializedRowGroupReader...) to the 
part that is specific to file/buffered reads.


> [Rust] [Parquet] Refactor file module to help adding sources
> ------------------------------------------------------------
>
>                 Key: ARROW-10135
>                 URL: https://issues.apache.org/jira/browse/ARROW-10135
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>    Affects Versions: 1.0.1
>            Reporter: Remi Dettai
>            Priority: Major
>              Labels: parquet, pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, the Parquet reader is very strongly tied to file system reads. 
> This makes it hard to add other sources. For instance, to implement S3, we 
> would need a reader that loads entire columns at once rather than buffered 
> reads of a few Ko.
> To improve modularity, we could try to move as much logic as possible to the 
> generic traits (FileReader, RowGroupReader...) and reduce the content of the 
> implementing structs (SerializedFileReader, SerializedRowGroupReader...) to 
> the part that is specific to file/buffered reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10135) [Rust] [Parquet] Refactor file module to help adding sources

Reply via email to