Re: [PR] Fix docs errors [arrow-julia]

via GitHub Mon, 16 Oct 2023 02:37:56 -0700


ericphanson commented on code in PR #491:
URL: https://github.com/apache/arrow-julia/pull/491#discussion_r1360394242



##########
docs/src/manual.md:
##########
@@ -48,7 +48,7 @@ table = Arrow.Table("data.arrow")
 
 ### `Arrow.Table`
 
-The type of `table` in this example will be an `Arrow.Table`. When "reading" 
the arrow data, `Arrow.Table` first 
["mmapped"](https://en.wikipedia.org/wiki/Mmap) the `data.arrow` file, which is 
an important technique for dealing with data larger than available RAM on a 
system. By "mmapping" a file, the OS doesn't actually load the entire file 
contents into RAM at the same time, but file contents are "swapped" into RAM as 
different regions of a file are requested. Once "mmapped", `Arrow.Table` then 
inspected the metadata in the file to determine the number of columns, their 
names and types, at which byte offset each column begins in the file data, and 
even how many "batches" are included in this file (arrow tables may be 
partitioned into one or more "record batches" each containing portions of the 
data). Armed with all the appropriate metadata, `Arrow.Table` then created 
custom array objects ([`ArrowVector`](@ref)), which act as "views" into the raw 
arrow memory bytes. This is a signi
 ficant point in that no extra memory is allocated for "data" when reading 
arrow data. This is in contrast to if we wanted to read data from a csv file as 
columns into Julia structures; we would need to allocate those array structures 
ourselves, then parse the file, "filling in" each element of the array with the 
data we parsed from the file. Arrow data, on the other hand, is *already laid 
out in memory or on disk* in a binary format, and as long as we have the 
metadata to interpret the raw bytes, we can figure out whether to treat those 
bytes as a `Vector{Float64}`, etc. A sample of the kinds of arrow array types 
you might see when deserializing arrow data, include:
+The type of `table` in this example will be an `Arrow.Table`. When "reading" 
the arrow data, `Arrow.Table` first 
["mmapped"](https://en.wikipedia.org/wiki/Mmap) the `data.arrow` file, which is 
an important technique for dealing with data larger than available RAM on a 
system. By "mmapping" a file, the OS doesn't actually load the entire file 
contents into RAM at the same time, but file contents are "swapped" into RAM as 
different regions of a file are requested. Once "mmapped", `Arrow.Table` then 
inspected the metadata in the file to determine the number of columns, their 
names and types, at which byte offset each column begins in the file data, and 
even how many "batches" are included in this file (arrow tables may be 
partitioned into one or more "record batches" each containing portions of the 
data). Armed with all the appropriate metadata, `Arrow.Table` then created 
custom array objects ([`Arrow.ArrowVector`](@ref)), which act as "views" into 
the raw arrow memory bytes. This is a
  significant point in that no extra memory is allocated for "data" when 
reading arrow data. This is in contrast to if we wanted to read data from a csv 
file as columns into Julia structures; we would need to allocate those array 
structures ourselves, then parse the file, "filling in" each element of the 
array with the data we parsed from the file. Arrow data, on the other hand, is 
*already laid out in memory or on disk* in a binary format, and as long as we 
have the metadata to interpret the raw bytes, we can figure out whether to 
treat those bytes as a `Vector{Float64}`, etc. A sample of the kinds of arrow 
array types you might see when deserializing arrow data, include:

Review Comment:
   Only change here is `ArrowVector` -> `Arrow.ArrowVector` since the reference 
doesn't work without qualification (as the function isn't exported)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix docs errors [arrow-julia]

Reply via email to