[
https://issues.apache.org/jira/browse/ARROW-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184086#comment-17184086
]
Andrew Lamb commented on ARROW-9275:
------------------------------------
In general, I think the notion of implementing async Parquet and Arrow APIs
that don't rely on tokio or other executors is a good idea.
I think in order to make the crate as widely useful as possible, it should also
retain a synchronous API for use with the rust standard library.
One pattern I have seen is a using a `async` crate option that adds the
appropriate async options (and possibly additional dependencies). For example,
https://docs.rs/bzip2/0.4.1/bzip2/#async-io
> [Rust] – Async Sans IO: R/W into/to Arrow Arrays
> ------------------------------------------------
>
> Key: ARROW-9275
> URL: https://issues.apache.org/jira/browse/ARROW-9275
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust
> Reporter: Mahmut Bulut
> Assignee: Mahmut Bulut
> Priority: Major
>
> This issue can be considered an epic level that spans across other arrow
> projects.
> *Drill down*
> Currently, traits like `ParquetReader` only allow synchronous interface which
> uses BufReader having 8KB constant buffer. Over the network, this becomes a
> problem. This can be easily solvable with differential buffers. In addition
> to this shortage, there is a problem of executor engine is needed to schedule
> from async trait methods to sync trait methods which should sit somewhere in
> between to make requests asynchronous to external IO. On-disk IO is
> acceptable with the approach we currently have since no reliable evented IO
> exists for on-disk IO on major platforms.
> All these considered abstractions that will expose asynchronous IO without
> any side from executors, needs to be exposed.
>
> *Design Suggestions & Considerations*
> The design should apply and consider:
> * Sans IO, (for more information about Sans approach please see
> [https://sans-io.readthedocs.io/] )
> * Not including any executor specific data, at all.
> * Tests should work with any executor with little to no modification.
> * Buffers are adjusted accordingly and use differential buffers to optimize
> network trips.
> * Sync IO shouldn't be touched. At all costs. If we try to unify Sync IO
> traits or we do overlapping implementation, that will make our life harder in
> the future. Sans IO should be compartmentalized.
>
> *Notes*
> If Sans approach is not taken, the project will:
> * use an extreme amount of dependencies.
> * be not compatible with other Rust code at all.
> * break currently working code uses array ingestions.
> * integrations tests are going to be harder.
> * it will really hard to adapt to completion-based APIs stabilize in the
> future. (in the user projects)
> * this suggestion is not about the flight format or any flight-related
> information atm. This is purely making on-disk, remote IO (provider backends
> like AWS etc.) async.
>
> *Open points*
> A couple of open points:
> * Identifying traits that are going to be asyncized.
> * Designing internal routines.
> * package name to expose.
> * Gather traits into the designated packages in all file formats.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)