asfimport opened a new issue, #372:
URL: https://github.com/apache/arrow-java/issues/372
Arrow is a great way to exchange data between systems. Somewhere in the
process, however, data must be load into, and read out of the Arrow vectors.
Arrow's vector code started with similar code inĀ Apache Drill. The Drill
project created a "Row Set" abstraction that:
- Provides a simple way to define the schema for a set of batches.
- Loads data into vectors from row-oriented inputs.
- Reads data out of vectors in row-oriented output.
- Controls memory consumed by the record batch when loading data into
vectors.
- Ensures maximum usage of the allocated vector space when loading data Into
vectors.
- Optionally handles projection when reading data from an input file into a
set of vectors.
- Optionally handles data conversion from input to vector formats.
This mechanism is handy for any Java developer who produces or consumes
Arrow vectors.
Detailed information is available in [this
wiki](https://github.com/paul-rogers/arrow/wiki), including a more detailed
description of the motivation for this project, and an analysis of the work
required to do the Drill-to-Arrow port.
The code is in Java simply because Drill is written in Java. The same
mechanisms can be ported to other languages if useful. Those ports would be
separate future projects.
The code will be placed in a new Java module which can be imported by
projects that wish to use the code. Changes may be needed to expose items from
the `vector` module; we'll tackle those issues if/when they occur.
**Reporter**: [Paul Rogers](https://issues.apache.org/jira/browse/ARROW-3164)
<sub>**Note**: *This issue was originally created as
[ARROW-3164](https://issues.apache.org/jira/browse/ARROW-3164). Please see the
[migration documentation](https://github.com/apache/arrow/issues/14542) for
further details.*</sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]