ericphanson commented on code in PR #491:
URL: https://github.com/apache/arrow-julia/pull/491#discussion_r1360394242
##########
docs/src/manual.md:
##########
@@ -48,7 +48,7 @@ table = Arrow.Table("data.arrow")
### `Arrow.Table`
-The type of `table` in this example will be an `Arrow.Table`. When "reading"
the arrow data, `Arrow.Table` first
["mmapped"](https://en.wikipedia.org/wiki/Mmap) the `data.arrow` file, which is
an important technique for dealing with data larger than available RAM on a
system. By "mmapping" a file, the OS doesn't actually load the entire file
contents into RAM at the same time, but file contents are "swapped" into RAM as
different regions of a file are requested. Once "mmapped", `Arrow.Table` then
inspected the metadata in the file to determine the number of columns, their
names and types, at which byte offset each column begins in the file data, and
even how many "batches" are included in this file (arrow tables may be
partitioned into one or more "record batches" each containing portions of the
data). Armed with all the appropriate metadata, `Arrow.Table` then created
custom array objects ([`ArrowVector`](@ref)), which act as "views" into the raw
arrow memory bytes. This is a signi
ficant point in that no extra memory is allocated for "data" when reading
arrow data. This is in contrast to if we wanted to read data from a csv file as
columns into Julia structures; we would need to allocate those array structures
ourselves, then parse the file, "filling in" each element of the array with the
data we parsed from the file. Arrow data, on the other hand, is *already laid
out in memory or on disk* in a binary format, and as long as we have the
metadata to interpret the raw bytes, we can figure out whether to treat those
bytes as a `Vector{Float64}`, etc. A sample of the kinds of arrow array types
you might see when deserializing arrow data, include:
+The type of `table` in this example will be an `Arrow.Table`. When "reading"
the arrow data, `Arrow.Table` first
["mmapped"](https://en.wikipedia.org/wiki/Mmap) the `data.arrow` file, which is
an important technique for dealing with data larger than available RAM on a
system. By "mmapping" a file, the OS doesn't actually load the entire file
contents into RAM at the same time, but file contents are "swapped" into RAM as
different regions of a file are requested. Once "mmapped", `Arrow.Table` then
inspected the metadata in the file to determine the number of columns, their
names and types, at which byte offset each column begins in the file data, and
even how many "batches" are included in this file (arrow tables may be
partitioned into one or more "record batches" each containing portions of the
data). Armed with all the appropriate metadata, `Arrow.Table` then created
custom array objects ([`Arrow.ArrowVector`](@ref)), which act as "views" into
the raw arrow memory bytes. This is a
significant point in that no extra memory is allocated for "data" when
reading arrow data. This is in contrast to if we wanted to read data from a csv
file as columns into Julia structures; we would need to allocate those array
structures ourselves, then parse the file, "filling in" each element of the
array with the data we parsed from the file. Arrow data, on the other hand, is
*already laid out in memory or on disk* in a binary format, and as long as we
have the metadata to interpret the raw bytes, we can figure out whether to
treat those bytes as a `Vector{Float64}`, etc. A sample of the kinds of arrow
array types you might see when deserializing arrow data, include:
Review Comment:
Only change here is `ArrowVector` -> `Arrow.ArrowVector` since the reference
doesn't work without qualification (as the function isn't exported)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]