baumgold commented on code in PR #400:
URL: https://github.com/apache/arrow-julia/pull/400#discussion_r1134709709
##########
src/write.jl:
##########
@@ -48,14 +48,29 @@ Supported keyword arguments to `Arrow.write` include:
* `metadata=Arrow.getmetadata(tbl)`: the metadata that should be written as
the table's schema's `custom_metadata` field; must either be `nothing` or an
iterable of `<:AbstractString` pairs.
* `ntasks::Int`: number of buffered threaded tasks to allow while writing
input partitions out as arrow record batches; default is no limit; for
unbuffered writing, pass `ntasks=0`
* `file::Bool=false`: if a an `io` argument is being written to, passing
`file=true` will cause the arrow file format to be written instead of just IPC
streaming
+ * `chunksize::Union{Nothing,Integer}=64000`: if a table is being written,
this will cause the table to be partitioned into chunks of the given size
(`chunksize` rows); if `nothing`, no partitioning will occur
"""
function write end
write(io_or_file; kw...) = x -> write(io_or_file, x; kw...)
-function write(file_path, tbl; kwargs...)
+function write(file_path, tbl; chunksize::Union{Nothing,Integer}=64000,
kwargs...)
Review Comment:
I think `chunksize` should move to be a new field in `Writer` with default
kwarg value set in the `Base.open` constructor on L170. This would eliminate
the code duplication.
##########
src/write.jl:
##########
@@ -278,9 +293,23 @@ function Base.close(writer::Writer)
nothing
end
-function write(io::IO, tbl; kwargs...)
+function write(io::IO, tbl; chunksize::Union{Nothing,Integer}=64000, kwargs...)
+ # rowaccces is a necessary pre-requisite for row-iteration (not sufficient
though)
+ if !isnothing(chunksize) && Tables.istable(tbl) && Tables.rowaccess(tbl)
+ @assert chunksize >= 0 "chunksize must be >= 0"
+ if hasmethod(Iterators.partition,(typeof(tbl),))
+ tbl_source = Iterators.partition(tbl, chunksize)
Review Comment:
Can we use `Iterators.partition` from Base rather than `DataFrames` to
prevent adding one more dependency?
https://docs.julialang.org/en/v1/base/iterators/#Base.Iterators.partition
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]