This is an automated email from the ASF dual-hosted git repository.
nevime pushed a change to branch rust-parquet-arrow-writer
in repository https://gitbox.apache.org/repos/asf/arrow.git.
discard 8f83f22 ARROW-8423: [Rust] [Parquet] Serialize Arrow schema metadata
discard c424f5c ARROW-8289: [Rust] Parquet Arrow writer with nested support
add 6d02508 ARROW-9699: [C++][Compute] Optimize mode kernel for small
integer types
add 7ed91f7 ARROW-9702: [C++] Register bpacking SIMD to runtime path.
add f8b285b ARROW-8001: [R][Dataset] Bindings for dataset writing
add 74e64d0 ARROW-9855: [R] Fix bad merge/Rcpp conflict
add 4c4193d ARROW-9813: [C++] Disable semantic interposition
add fec740b ARROW-9816: [C++] Escape quotes in config.h
add 46b6dc6 ARROW-9464: [Rust] [DataFusion] Physical plan optimization
rule to insert MergeExec when needed
add d02e166 ARROW-9849: [Rust] [DataFusion] Simplified argument types of
ScalarFunctions.
add 92e01cc ARROW-9844: [CI] Add Go build job on s390x
add b4063cc ARROW-9853: [RUST] Implement take kernel for dictionary arrays
add 7ce498e PARQUET-1904: [C++] Export file_offset in RowGroupMetaData
add 52e2c75 ARROW-9795: [C++][Gandiva] Implement castTIMESTAMP(int64) in
Gandiva
add 668b4b7 ARROW-9723: [C++][Compute] Count NaN in mode kernel
add 6e04489 ARROW-9811: [C++] Unchecked floating point division by 0
should succeed
add 0f33e9e ARROW-9871: [C++] Add uppercase to ARROW_USER_SIMD_LEVEL
add 67983cf ARROW-9660: [C++] Revamp dictionary association in IPC
add b72fab3 ARROW-9877: [C++] Fix homebrew-cpp build fail on AVX512
add d8ae71a ARROW-9876: [C++] Faster ARM build on Travis-CI
add 2c60c8e ARROW-9823: [CI][C++][MinGW] Enable S3
add f8c9c8b ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind
add 0cfddaf ARROW-9883: [R] Fix linuxlibs.R install script for R < 3.6
add df3bee2 ARROW-9850:[Go] Defer should not be used inside a loop
add 2a3b989 ARROW-8383: [Rust] Allow easier access to keys array of a
dictionary array
add f023ed4 ARROW-9882: [C++/Python] Update OSX build to
conda-forge-ci-setup=3
add 31b2a52 ARROW-9646: [C++][Dataset] Support writing with
ParquetFileFormat
add de87636 ARROW-9884: [R] Bindings for writing datasets to Parquet
add 87b85af ARROW-8493: [C++][Parquet] Start populating repeated ancestor
defintion
add 8f3b029 ARROW-9867: [C++][Dataset] Add FileSystemDataset::filesystem
property
add 1790751 ARROW-9886: [Rust] [DataFusion] Parameterized testing of
physical cast.
add a898ee1 ARROW-9887: [Rust] [DataFusion] Added support for complex
return types for built-in functions
add 8813eac ARROW-9658: [Python] Python bindings for dataset writing
add 8455e33 ARROW-9815: [Rust][DataFusion] Remove the use of Arc/Mutex to
protect plan time structures
add 956502c ARROW-9889: [Rust][DataFusion] Implement physical plan for
EmptyRelation
add 1b4e2c7 ARROW-9629: [Python] Fix kartothek integration tests by
fixing dependencies
add 7e39711 ARROW-9875: [Python] Let FileSystem.get_file_info accept a
single path
add 2272d9a ARROW-9642: [C++] Let MakeBuilder refer DictionaryType's
index_type for deciding the starting bit width of the indices
add 46dee85 ARROW-9874: [C++] Add sink-owning version of IPC writers
add 8eb49fe ARROW-9821: [Rust][DataFusion] Make crate::logical_plan and
crate::physical_plan modules
add 9759280 ARROW-9858: [Python][Docs] Add user guide for filesystems
interface
add 78b96de ARROW-7226: [Python][Doc] Add note re: JSON format support
add 1a14298 ARROW-9794: [C++] Add IsVendor API for CpuInfo
add 5a3291c ARROW-9605: [C++] Speed up aggregate min/max compute kernels
on integer types
add 823fe60 ARROW-9873: [C++][Compute] Optimize mode kernel for integers
in small value range
add 2a0fc0a ARROW-9845: [Rust] [Parquet] Move serde_json dependency to
dev-dependencies as it is only used in tests
add b747b5a ARROW-9891: [Rust] [DataFusion] Made math functions accept
f32.
add 8910af1 ARROW-9892: [Rust] [DataFusion] Added concat of utf8
add 8cd854a ARROW-9900: [Rust][DataFusion] Switch from Box -> Arc in
LogicalPlanNode
add b5feede ARROW-9583: [Rust] Fix offsets in result of arithmetic kernels
add 51c71e6 ARROW-9888: [Rust][DataFusion] Allow ExecutionContext to be
shared between threads (again)
add 54b715c ARROW-9885: [Rust][DataFusion] Minor code simplification
add 5d3d48a ARROW-9852: [C++] Validate dictionaries fully when combining
deltas
add 975e166 ARROW-9852: [C++] Add more IPC fuzz regression files
add 247996e ARROW-9899: [Rust] [DataFusion] Switch from Box<Schema> -->
SchemaRef (Arc<Schema>) to be consistent with the rest of Arrow
add e2ae212 ARROW-9863: [C++][Parquet] Compile regexes only once
add 9eeaf21 ARROW-9916: [RUST] Avoid cloning array data
add ce6a28b ARROW-9921: [Rust] Replace TryFrom by From in `StringArray`
from `Vec<Option<&str>>` (+50%)
add aaabcb4 ARROW-9910: [Rust][DataFusion] Fixed error in type coercion
of Variadic.
add 16ebc8a ARROW-9914: [Rust][DataFusion] Document SQL Type --> Arrow
type mapping
add 27f50c7 ARROW-9928: [C++] Speed up integer parsing slightly
add e2fbac5 ARROW-9901: [C++] Add hand-crafted Parquet to Arrow
reconstruction tests
add 69239b4 ARROW-9904: [C++] Unroll the loop of CountSetBits.
add 76e2ac5 ARROW-9913: [C++] Make outputs of Decimal128::FromString
independent of the presence of one another.
add 192f639 ARROW-9929: [Dev] Autotune cmake-format
add a56e483 ARROW-9718: [Python] ParquetWriter to work with new
FileSystem API
add 5d66bc5 ARROW-9908: [Rust] Add support for temporal types in JSON
reader
add 4186a66 ARROW-9836: [Rust][DataFusion] Improve API for usage of UDFs
add 20d854e ARROW-9925: [GLib] Add low level value readers for
GArrowListArray family
add 3e3e18b ARROW-9926: [GLib] Use placement new for
GArrowRecordBatchFileReader
add b89d192 ARROW-9917: [Python][Compute] Bindings for mode kernel
add 9ea2409 ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+
add da641aa ARROW-9920: [Python] Validate input to pa.concat_arrays() to
avoid segfault
add 54f8d28 ARROW-9821: [Rust][DataFusion] Support for User Defined
ExtensionNodes in the LogicalPlan
add eefc90a ARROW-9588: [C++] Partially support building with clang in an
MSVC setting
add a5969ae ARROW-9864: [Python] Support pathlib.path in
pq.write_to_dataset
add 8c4fa35 ARROW-9827: [C++][Dataset] Skip parsing RowGroup metadata
statistics when there is no filter
add 148cb3d ARROW-9893: [Python] Support parquet options in dataset
writing
add 3ce1a86 ARROW-9931: [C++] Fix undefined behaviour on invalid IPC input
add 87640f5 ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple
data types per argument
add 30143fc ARROW-9895: [Rust] Improve sorting kernels
add 1c35365 ARROW-9814: [Python] Fix crash in
test_parquet::test_read_partitioned_directory_s3fs
add 2095c89 ARROW-9906: [C++] Keep S3 filesystem alive through open file
objects
add d7f6e9f ARROW-9936: [Python] Fix / test relative file paths in
pyarrow.parquet
add 3daebaa ARROW-9953: [R] Declare minimum version for bit64
add 83ef24e ARROW-9806: [R] More compute kernel bindings
add 5c3beb3 ARROW-9890: [R] Add zstandard compression codec in macOS build
add 23e3db7 ARROW-9944: [Rust][DataFusion] Implement to_timestamp function
add 9921c83 ARROW-9837: [Rust][DataFusion] Added provider for variable
add f9643a9 ARROW-9104: [C++] Parquet encryption tests should write files
to a temporary directory instead of the testing submodule's directory
add 175c53d ARROW-9949: [C++] Improve performance of
Decimal128::FromString by 46%, and make the implementation reusable for
Decimal256.
add 986eab4 ARROW-9968: [C++] Fix UBSAN build
add b77e8ae ARROW-9854: [R] Support reading/writing data to/from S3
add 1d0e96a ARROW-9972: [CI] Work around grpc-re2 clash on Homebrew
add d33b458 ARROW-9957: [Rust] Replace tempdir with tempfile
add 874c65f ARROW-9966: [Rust] Speedup kernels for sum,min,max by 10%-60%
add cd94749 ARROW-9868: [C++][R] Provide CopyFiles for copying files
between FileSystems
add 974a74d ARROW-5034: [C#] ArrowStreamWriter and ArrowFileWriter
implement sync WriteRecordBatch
add 687a7eb ARROW-9271: [R] Preserve data frame metadata in round trip
add 5ec4ef0 ARROW-9387: [R] Use new C++ table select method
add f977855 ARROW-9979: [Rust] Fix arrow crate clippy lints
add 2726a71 ARROW-9950: [Rust] [DataFusion] Made UDFs usable without a
registry
add a371dde ARROW-9954: [Rust] [DataFusion] Made aggregates support the
same signatures as functions.
add ad82762 ARROW-9961: [Rust][DataFusion] Make to_timestamp function
parses timestamp without timezone offset as local
add ce16763 ARROW-9790: [Rust][Parquet]: Increase test coverage in
arrow_reader.rs
add c6994f1 ARROW-9980: [Rust] [Parquet] Fix clippy lints
new 2b5e102 ARROW-8289: [Rust] Parquet Arrow writer with nested support
new 81f1020 ARROW-8423: [Rust] [Parquet] Serialize Arrow schema metadata
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (8f83f22)
\
N -- N -- N refs/heads/rust-parquet-arrow-writer (81f1020)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.github/workflows/comment_bot.yml | 9 +-
.github/workflows/cpp.yml | 3 +-
.github/workflows/ruby.yml | 1 +
.travis.yml | 12 +
c_glib/arrow-glib/composite-array.cpp | 176 ++
c_glib/arrow-glib/composite-array.h | 24 +
c_glib/arrow-glib/compute.cpp | 2 +-
c_glib/arrow-glib/reader.cpp | 5 +-
c_glib/arrow-glib/writer.cpp | 8 +-
c_glib/doc/arrow-glib/arrow-glib-docs.xml | 4 +
c_glib/test/test-large-list-array.rb | 62 +-
c_glib/test/test-list-array.rb | 61 +-
ci/docker/conda-python-kartothek.dockerfile | 6 +
ci/scripts/cpp_build.sh | 1 +
ci/scripts/cpp_test.sh | 12 +
ci/scripts/integration_kartothek.sh | 2 +-
ci/scripts/msys2_setup.sh | 1 +
cpp/Brewfile | 4 +-
cpp/CMakeLists.txt | 17 +-
cpp/cmake_modules/BuildUtils.cmake | 18 +-
cpp/cmake_modules/DefineOptions.cmake | 13 +-
cpp/cmake_modules/FindArrow.cmake | 13 +-
cpp/cmake_modules/FindBoostAlt.cmake | 2 +-
cpp/cmake_modules/FindGTest.cmake | 4 +-
cpp/cmake_modules/FindLz4.cmake | 2 +-
cpp/cmake_modules/FindThrift.cmake | 4 +-
cpp/cmake_modules/SetupCxxFlags.cmake | 20 +-
cpp/cmake_modules/ThirdpartyToolchain.cmake | 6 +-
cpp/cmake_modules/san-config.cmake | 7 +-
cpp/examples/minimal_build/example.cc | 4 +-
cpp/src/arrow/CMakeLists.txt | 29 +-
cpp/src/arrow/array/array_dict_test.cc | 86 +-
cpp/src/arrow/array/array_test.cc | 29 +
cpp/src/arrow/array/builder_adaptive.cc | 12 +-
cpp/src/arrow/array/builder_adaptive.h | 21 +-
cpp/src/arrow/array/builder_base.cc | 16 +
cpp/src/arrow/array/builder_base.h | 6 +
cpp/src/arrow/array/builder_dict.h | 64 +-
cpp/src/arrow/array/concatenate.cc | 8 +-
cpp/src/arrow/array/data.cc | 2 +
cpp/src/arrow/buffer.cc | 11 +-
cpp/src/arrow/builder.cc | 15 +-
cpp/src/arrow/compare.cc | 28 +-
cpp/src/arrow/compare.h | 4 +-
cpp/src/arrow/compute/api_aggregate.h | 17 -
cpp/src/arrow/compute/kernels/aggregate_basic.cc | 225 +--
...gregate_sum_avx2.cc => aggregate_basic_avx2.cc} | 17 +
...ate_sum_avx512.cc => aggregate_basic_avx512.cc} | 17 +
.../compute/kernels/aggregate_basic_internal.h | 292 +++-
.../arrow/compute/kernels/aggregate_benchmark.cc | 12 +
cpp/src/arrow/compute/kernels/aggregate_mode.cc | 190 ++-
cpp/src/arrow/compute/kernels/aggregate_test.cc | 239 ++-
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc | 4 -
.../compute/kernels/scalar_arithmetic_test.cc | 28 +-
cpp/src/arrow/dataset/discovery.cc | 11 +-
cpp/src/arrow/dataset/file_base.cc | 16 +-
cpp/src/arrow/dataset/file_base.h | 18 +-
cpp/src/arrow/dataset/file_csv.h | 8 +-
cpp/src/arrow/dataset/file_ipc.cc | 4 +-
cpp/src/arrow/dataset/file_ipc.h | 2 +-
cpp/src/arrow/dataset/file_ipc_test.cc | 305 +---
cpp/src/arrow/dataset/file_parquet.cc | 57 +-
cpp/src/arrow/dataset/file_parquet.h | 19 +-
cpp/src/arrow/dataset/file_parquet_test.cc | 77 +
cpp/src/arrow/dataset/file_test.cc | 2 +-
cpp/src/arrow/dataset/filter.cc | 2 +-
cpp/src/arrow/dataset/test_util.h | 327 +++-
cpp/src/arrow/dataset/type_fwd.h | 2 +
cpp/src/arrow/filesystem/CMakeLists.txt | 6 +
cpp/src/arrow/filesystem/filesystem.cc | 30 +
cpp/src/arrow/filesystem/filesystem.h | 12 +
cpp/src/arrow/filesystem/filesystem_test.cc | 22 +
cpp/src/arrow/filesystem/s3_test_util.h | 9 +-
cpp/src/arrow/filesystem/s3fs.cc | 101 +-
cpp/src/arrow/filesystem/s3fs_narrative_test.cc | 2 +-
cpp/src/arrow/filesystem/s3fs_test.cc | 44 +-
cpp/src/arrow/filesystem/test_util.cc | 5 +-
cpp/src/arrow/flight/CMakeLists.txt | 3 +-
cpp/src/arrow/flight/internal.cc | 6 +-
cpp/src/arrow/flight/perf_server.cc | 6 +-
cpp/src/arrow/flight/server.cc | 25 +-
cpp/src/arrow/flight/test_util.cc | 10 +-
cpp/src/arrow/gpu/cuda_test.cc | 2 +-
cpp/src/arrow/ipc/dictionary.cc | 361 ++--
cpp/src/arrow/ipc/dictionary.h | 154 +-
cpp/src/arrow/ipc/feather.cc | 2 +-
cpp/src/arrow/ipc/feather_test.cc | 10 +-
cpp/src/arrow/ipc/file_to_stream.cc | 4 +-
cpp/src/arrow/ipc/generate_fuzz_corpus.cc | 4 +-
cpp/src/arrow/ipc/metadata_internal.cc | 110 +-
cpp/src/arrow/ipc/metadata_internal.h | 9 +-
cpp/src/arrow/ipc/read_write_benchmark.cc | 6 +-
cpp/src/arrow/ipc/read_write_test.cc | 162 +-
cpp/src/arrow/ipc/reader.cc | 166 +-
cpp/src/arrow/ipc/stream_to_file.cc | 2 +-
cpp/src/arrow/ipc/test_common.cc | 113 +-
cpp/src/arrow/ipc/test_common.h | 11 +-
cpp/src/arrow/ipc/writer.cc | 93 +-
cpp/src/arrow/ipc/writer.h | 44 +-
cpp/src/arrow/json/chunked_builder.cc | 4 +-
cpp/src/arrow/python/flight.cc | 5 +-
cpp/src/arrow/python/flight.h | 2 +-
cpp/src/arrow/python/type_traits.h | 24 +-
cpp/src/arrow/scalar.cc | 4 +-
cpp/src/arrow/scalar.h | 4 +-
cpp/src/arrow/testing/generator.h | 13 +
cpp/src/arrow/testing/gtest_util.cc | 40 +-
cpp/src/arrow/testing/gtest_util.h | 11 +-
cpp/src/arrow/testing/json_integration.cc | 13 +-
cpp/src/arrow/testing/json_integration_test.cc | 53 +-
cpp/src/arrow/testing/json_internal.cc | 773 ++++-----
cpp/src/arrow/testing/json_internal.h | 11 +-
cpp/src/arrow/util/bit_stream_utils.h | 1 +
cpp/src/arrow/util/bitmap_ops.cc | 20 +-
cpp/src/arrow/util/{bpacking.h => bpacking.cc} | 44 +-
cpp/src/arrow/util/bpacking.h | 122 +-
cpp/src/arrow/util/bpacking_avx2.cc | 137 ++
.../mod.rs => cpp/src/arrow/util/bpacking_avx2.h | 16 +-
..._avx512_codegen.py => bpacking_avx2_codegen.py} | 94 +-
cpp/src/arrow/util/bpacking_avx2_generated.h | 1786 ++++++++++++++++++++
cpp/src/arrow/util/bpacking_avx512.cc | 137 ++
.../mod.rs => cpp/src/arrow/util/bpacking_avx512.h | 16 +-
cpp/src/arrow/util/bpacking_avx512_codegen.py | 27 +-
cpp/src/arrow/util/bpacking_avx512_generated.h | 445 ++---
cpp/src/arrow/util/bpacking_default.h | 3 +
cpp/src/arrow/util/cpu_info.cc | 66 +-
cpp/src/arrow/util/cpu_info.h | 23 +-
cpp/src/arrow/util/decimal.cc | 133 +-
cpp/src/arrow/util/decimal_test.cc | 56 +-
cpp/src/arrow/util/dispatch.h | 115 ++
cpp/src/arrow/util/int128_internal.h | 48 +
cpp/src/arrow/util/io_util.h | 1 +
cpp/src/arrow/util/value_parsing.h | 101 +-
cpp/src/gandiva/CMakeLists.txt | 2 +-
cpp/src/gandiva/function_registry_datetime.cc | 3 +
cpp/src/gandiva/precompiled/time.cc | 2 +
cpp/src/gandiva/precompiled/types.h | 1 +
cpp/src/parquet/CMakeLists.txt | 7 +-
cpp/src/parquet/arrow/arrow_schema_test.cc | 357 +++-
cpp/src/parquet/arrow/reader.cc | 11 +-
cpp/src/parquet/arrow/reconstruct_internal_test.cc | 1652 ++++++++++++++++++
cpp/src/parquet/arrow/schema.cc | 131 +-
cpp/src/parquet/arrow/schema.h | 20 +-
cpp/src/parquet/arrow/writer.cc | 3 +-
cpp/src/parquet/column_writer.cc | 5 +-
cpp/src/parquet/column_writer.h | 12 +-
.../parquet/encryption_read_configurations_test.cc | 4 +-
.../encryption_write_configurations_test.cc | 11 +-
cpp/src/parquet/exception.h | 14 +-
cpp/src/parquet/level_conversion.h | 108 ++
cpp/src/parquet/metadata.cc | 7 +-
cpp/src/parquet/metadata.h | 7 +
cpp/src/parquet/test_encryption_util.h | 13 +
cpp/src/plasma/CMakeLists.txt | 2 +-
csharp/src/Apache.Arrow/Apache.Arrow.csproj | 3 +-
.../Extensions/StreamExtensions.netcoreapp2.1.cs | 5 +
.../Extensions/StreamExtensions.netstandard.cs | 21 +
...oreapp2.1.cs => TupleExtensions.netstandard.cs} | 10 +-
csharp/src/Apache.Arrow/Ipc/ArrowFileWriter.cs | 91 +
csharp/src/Apache.Arrow/Ipc/ArrowStreamWriter.cs | 218 ++-
.../Apache.Arrow.Tests/ArrowFileWriterTests.cs | 35 +-
.../Apache.Arrow.Tests/ArrowStreamWriterTests.cs | 227 ++-
dev/tasks/conda-recipes/azure.osx.yml | 4 +-
.../homebrew-formulae/autobrew/apache-arrow.rb | 2 +
docker-compose.yml | 1 +
docs/source/python/api/filesystems.rst | 12 +-
docs/source/python/filesystems.rst | 159 +-
docs/source/python/json.rst | 11 +-
go/arrow/internal/cpu/cpu_s390x.go | 7 +
go/arrow/ipc/cmd/arrow-cat/main.go | 4 +-
go/arrow/ipc/cmd/arrow-ls/main.go | 2 +-
go/arrow/ipc/file_reader.go | 2 +-
python/pyarrow/_dataset.pyx | 144 +-
python/pyarrow/_fs.pyx | 22 +-
python/pyarrow/_json.pyx | 3 +-
python/pyarrow/_parquet.pxd | 18 +
python/pyarrow/_parquet.pyx | 305 ++--
python/pyarrow/array.pxi | 3 +
python/pyarrow/compute.py | 26 +
python/pyarrow/dataset.py | 72 +-
python/pyarrow/filesystem.py | 18 +-
python/pyarrow/fs.py | 62 +-
python/pyarrow/includes/libarrow.pxd | 12 +-
python/pyarrow/includes/libarrow_dataset.pxd | 21 +
python/pyarrow/ipc.pxi | 21 +-
python/pyarrow/parquet.py | 39 +-
python/pyarrow/tests/test_array.py | 11 +
python/pyarrow/tests/test_compute.py | 25 +
python/pyarrow/tests/test_dataset.py | 240 ++-
python/pyarrow/tests/test_fs.py | 5 +
python/pyarrow/tests/test_parquet.py | 166 +-
python/pyarrow/tests/util.py | 10 +
python/pyarrow/types.pxi | 15 +-
r/DESCRIPTION | 10 +-
r/NAMESPACE | 9 +
r/NEWS.md | 14 +-
r/R/arrowExports.R | 28 +-
r/R/compute.R | 56 +-
r/R/csv.R | 9 +-
r/R/dataset-factory.R | 150 ++
r/R/dataset-format.R | 131 ++
r/R/dataset-partition.R | 113 ++
r/R/dataset-scan.R | 170 ++
r/R/dataset-write.R | 60 +-
r/R/dataset.R | 526 +-----
r/R/feather.R | 6 +-
r/R/filesystem.R | 37 +-
r/R/io.R | 23 +-
r/R/ipc_stream.R | 8 +-
r/R/json.R | 7 +-
r/R/parquet.R | 80 +-
r/R/record-batch.R | 37 +-
r/R/table.R | 51 +-
r/man/Dataset.Rd | 5 +-
r/man/FileFormat.Rd | 2 +-
r/man/ParquetFileWriter.Rd | 2 +-
r/man/Partitioning.Rd | 2 +-
r/man/RecordBatch.Rd | 5 +-
r/man/Scanner.Rd | 2 +-
r/man/Table.Rd | 5 +-
r/man/arrow-package.Rd | 2 +-
r/man/dataset_factory.Rd | 2 +-
r/man/hive_partition.Rd | 2 +-
r/man/make_readable_file.Rd | 6 +-
r/man/map_batches.Rd | 2 +-
r/man/match_arrow.Rd | 23 +
r/man/read_delim_arrow.Rd | 2 +-
r/man/read_feather.Rd | 4 +-
r/man/read_ipc_stream.Rd | 4 +-
r/man/read_json_arrow.Rd | 2 +-
r/man/read_parquet.Rd | 4 +-
r/man/write_dataset.Rd | 19 +-
r/man/write_feather.Rd | 2 +-
r/man/write_ipc_stream.Rd | 2 +-
r/man/write_parquet.Rd | 25 +-
r/src/arrowExports.cpp | 109 +-
r/src/arrow_cpp11.h | 1 +
r/src/dataset.cpp | 26 +-
r/src/filesystem.cpp | 16 +
r/src/recordbatch.cpp | 5 +-
r/src/recordbatchwriter.cpp | 4 +-
r/src/schema.cpp | 4 +-
r/src/symbols.cpp | 1 +
r/src/table.cpp | 142 +-
r/tests/testthat/test-Table.R | 19 +-
r/tests/testthat/test-compute-aggregate.R | 95 +-
r/tests/testthat/test-dataset.R | 181 +-
r/tests/testthat/test-metadata.R | 8 +
r/tests/testthat/test-s3.R | 52 +
r/tools/autobrew | 2 +-
r/tools/linuxlibs.R | 5 +-
r/vignettes/dataset.Rmd | 88 +-
r/vignettes/fs.Rmd | 59 +
rust/arrow/Cargo.toml | 8 +
rust/arrow/benches/aggregate_kernels.rs | 68 +
rust/arrow/benches/arithmetic_kernels.rs | 63 +-
rust/arrow/benches/array_from_vec.rs | 22 +
rust/arrow/benches/buffer_bit_ops.rs | 188 +++
rust/arrow/benches/cast_kernels.rs | 4 +-
rust/arrow/benches/csv_writer.rs | 2 +-
rust/arrow/benches/length_kernel.rs | 2 +-
rust/arrow/benches/take_kernels.rs | 4 +-
rust/arrow/examples/builders.rs | 6 +-
rust/arrow/examples/read_csv.rs | 2 +-
rust/arrow/examples/read_csv_infer_schema.rs | 2 +-
rust/arrow/src/array/array.rs | 327 ++--
rust/arrow/src/array/builder.rs | 41 +-
rust/arrow/src/array/cast.rs | 10 +
rust/arrow/src/array/equal.rs | 122 +-
rust/arrow/src/array/mod.rs | 3 +-
rust/arrow/src/array/null.rs | 2 +-
rust/arrow/src/array/ord.rs | 273 ++-
rust/arrow/src/array/union.rs | 20 +-
rust/arrow/src/buffer.rs | 319 ++--
rust/arrow/src/compute/kernels/aggregate.rs | 62 +-
rust/arrow/src/compute/kernels/arithmetic.rs | 127 +-
rust/arrow/src/compute/kernels/boolean.rs | 152 +-
rust/arrow/src/compute/kernels/cast.rs | 52 +-
rust/arrow/src/compute/kernels/comparison.rs | 29 +-
rust/arrow/src/compute/kernels/concat.rs | 56 +-
rust/arrow/src/compute/kernels/filter.rs | 8 +-
rust/arrow/src/compute/kernels/length.rs | 18 +-
rust/arrow/src/compute/kernels/limit.rs | 6 +-
rust/arrow/src/compute/kernels/sort.rs | 622 +++++--
rust/arrow/src/compute/kernels/take.rs | 109 +-
rust/arrow/src/compute/util.rs | 109 +-
rust/arrow/src/csv/reader.rs | 10 +-
rust/arrow/src/datatypes.rs | 13 +-
rust/arrow/src/ipc/convert.rs | 2 +-
rust/arrow/src/ipc/gen/mod.rs | 5 +
rust/arrow/src/ipc/mod.rs | 5 +
rust/arrow/src/ipc/reader.rs | 2 +-
rust/arrow/src/ipc/writer.rs | 2 +-
rust/arrow/src/json/reader.rs | 202 ++-
rust/arrow/src/lib.rs | 6 +-
rust/arrow/src/tensor.rs | 12 +-
rust/arrow/src/util/integration_util.rs | 6 +-
rust/arrow/src/util/pretty.rs | 4 +-
rust/benchmarks/src/bin/nyctaxi.rs | 2 +-
rust/benchmarks/src/bin/tpch.rs | 2 +-
rust/datafusion/Cargo.toml | 11 +-
rust/datafusion/README.md | 44 +-
rust/datafusion/benches/math_query_sql.rs | 100 ++
rust/datafusion/benches/sort_limit_query_sql.rs | 133 ++
rust/datafusion/examples/simple_udf.rs | 146 ++
rust/datafusion/src/dataframe.rs | 19 +-
rust/datafusion/src/datasource/csv.rs | 6 +-
rust/datafusion/src/datasource/datasource.rs | 5 +-
rust/datafusion/src/datasource/memory.rs | 4 +-
rust/datafusion/src/datasource/parquet.rs | 4 +-
rust/datafusion/src/execution/context.rs | 363 ++--
rust/datafusion/src/execution/dataframe_impl.rs | 54 +-
rust/datafusion/src/execution/mod.rs | 1 -
.../execution/physical_plan/math_expressions.rs | 118 --
.../src/execution/physical_plan/planner.rs | 510 ------
rust/datafusion/src/execution/physical_plan/udf.rs | 160 --
rust/datafusion/src/lib.rs | 4 +-
.../src/{logicalplan.rs => logical_plan/mod.rs} | 491 +++---
rust/datafusion/src/optimizer/filter_push_down.rs | 27 +-
rust/datafusion/src/optimizer/mod.rs | 1 -
rust/datafusion/src/optimizer/optimizer.rs | 17 +-
.../src/optimizer/projection_push_down.rs | 37 +-
rust/datafusion/src/optimizer/type_coercion.rs | 148 --
rust/datafusion/src/optimizer/utils.rs | 50 +-
rust/datafusion/src/physical_plan/aggregates.rs | 203 +++
.../src/{execution => }/physical_plan/common.rs | 2 +-
.../src/{execution => }/physical_plan/csv.rs | 25 +-
.../src/physical_plan/datetime_expressions.rs | 367 ++++
rust/datafusion/src/physical_plan/empty.rs | 139 ++
.../src/{execution => }/physical_plan/explain.rs | 37 +-
.../{execution => }/physical_plan/expressions.rs | 485 +++---
.../src/{execution => }/physical_plan/filter.rs | 29 +-
rust/datafusion/src/physical_plan/functions.rs | 432 +++++
.../physical_plan/hash_aggregate.rs | 82 +-
.../src/{execution => }/physical_plan/limit.rs | 140 +-
.../src/physical_plan/math_expressions.rs | 98 ++
.../src/{execution => }/physical_plan/memory.rs | 19 +-
.../src/{execution => }/physical_plan/merge.rs | 53 +-
.../src/{execution => }/physical_plan/mod.rs | 51 +-
.../src/{execution => }/physical_plan/parquet.rs | 25 +-
rust/datafusion/src/physical_plan/planner.rs | 797 +++++++++
.../{execution => }/physical_plan/projection.rs | 33 +-
.../src/{execution => }/physical_plan/sort.rs | 161 +-
.../src/physical_plan/string_expressions.rs | 68 +
rust/datafusion/src/physical_plan/type_coercion.rs | 338 ++++
rust/datafusion/src/physical_plan/udf.rs | 106 ++
rust/datafusion/src/prelude.rs | 6 +-
rust/datafusion/src/sql/planner.rs | 164 +-
rust/datafusion/src/test/mod.rs | 48 +-
rust/datafusion/src/test/variable.rs | 58 +
.../{optimizer/optimizer.rs => variable/mod.rs} | 24 +-
rust/datafusion/tests/customer.csv | 4 +
rust/datafusion/tests/sql.rs | 149 +-
rust/datafusion/tests/user_defined_plan.rs | 510 ++++++
.../src/bin/arrow-json-integration-test.rs | 2 +-
rust/parquet/Cargo.toml | 2 +-
rust/parquet/src/arrow/array_reader.rs | 76 +-
rust/parquet/src/arrow/arrow_reader.rs | 158 +-
rust/parquet/src/arrow/converter.rs | 18 +-
rust/parquet/src/arrow/record_reader.rs | 12 +-
rust/parquet/src/column/page.rs | 30 +-
rust/parquet/src/column/reader.rs | 40 +-
rust/parquet/src/column/writer.rs | 52 +-
rust/parquet/src/compression.rs | 14 +-
rust/parquet/src/data_type.rs | 12 +-
rust/parquet/src/encodings/decoding.rs | 20 +-
rust/parquet/src/encodings/encoding.rs | 47 +-
rust/parquet/src/encodings/levels.rs | 13 +-
rust/parquet/src/encodings/rle.rs | 25 +-
rust/parquet/src/file/metadata.rs | 14 +-
rust/parquet/src/file/properties.rs | 4 +-
rust/parquet/src/file/reader.rs | 19 +-
rust/parquet/src/file/statistics.rs | 14 +-
rust/parquet/src/file/writer.rs | 24 +-
rust/parquet/src/lib.rs | 6 +
rust/parquet/src/record/api.rs | 33 +-
rust/parquet/src/record/reader.rs | 12 +-
rust/parquet/src/record/triplet.rs | 9 +-
rust/parquet/src/schema/parser.rs | 16 +-
rust/parquet/src/schema/printer.rs | 44 +-
rust/parquet/src/schema/types.rs | 18 +-
rust/parquet/src/schema/visitor.rs | 2 +-
rust/parquet/src/util/bit_packing.rs | 224 +--
rust/parquet/src/util/bit_util.rs | 28 +-
rust/parquet/src/util/io.rs | 12 +-
rust/parquet/src/util/memory.rs | 11 +-
rust/parquet/src/util/test_common/page_util.rs | 4 +-
rust/parquet/src/util/test_common/rand_gen.rs | 4 +-
testing | 2 +-
389 files changed, 19946 insertions(+), 6789 deletions(-)
rename cpp/src/arrow/compute/kernels/{aggregate_sum_avx2.cc =>
aggregate_basic_avx2.cc} (80%)
rename cpp/src/arrow/compute/kernels/{aggregate_sum_avx512.cc =>
aggregate_basic_avx512.cc} (80%)
copy cpp/src/arrow/util/{bpacking.h => bpacking.cc} (81%)
create mode 100644 cpp/src/arrow/util/bpacking_avx2.cc
copy rust/arrow/src/ipc/gen/mod.rs => cpp/src/arrow/util/bpacking_avx2.h (79%)
copy cpp/src/arrow/util/{bpacking_avx512_codegen.py =>
bpacking_avx2_codegen.py} (63%)
create mode 100644 cpp/src/arrow/util/bpacking_avx2_generated.h
create mode 100644 cpp/src/arrow/util/bpacking_avx512.cc
copy rust/arrow/src/ipc/gen/mod.rs => cpp/src/arrow/util/bpacking_avx512.h
(79%)
create mode 100644 cpp/src/arrow/util/dispatch.h
create mode 100644 cpp/src/arrow/util/int128_internal.h
create mode 100644 cpp/src/parquet/arrow/reconstruct_internal_test.cc
copy csharp/src/Apache.Arrow/Extensions/{StreamExtensions.netcoreapp2.1.cs =>
TupleExtensions.netstandard.cs} (76%)
create mode 100644 go/arrow/internal/cpu/cpu_s390x.go
create mode 100644 r/R/dataset-factory.R
create mode 100644 r/R/dataset-format.R
create mode 100644 r/R/dataset-partition.R
create mode 100644 r/R/dataset-scan.R
create mode 100644 r/man/match_arrow.Rd
create mode 100644 r/tests/testthat/test-s3.R
create mode 100644 r/vignettes/fs.Rmd
create mode 100644 rust/arrow/benches/aggregate_kernels.rs
create mode 100644 rust/arrow/benches/buffer_bit_ops.rs
create mode 100644 rust/datafusion/benches/math_query_sql.rs
create mode 100644 rust/datafusion/benches/sort_limit_query_sql.rs
create mode 100644 rust/datafusion/examples/simple_udf.rs
delete mode 100644
rust/datafusion/src/execution/physical_plan/math_expressions.rs
delete mode 100644 rust/datafusion/src/execution/physical_plan/planner.rs
delete mode 100644 rust/datafusion/src/execution/physical_plan/udf.rs
rename rust/datafusion/src/{logicalplan.rs => logical_plan/mod.rs} (77%)
delete mode 100644 rust/datafusion/src/optimizer/type_coercion.rs
create mode 100644 rust/datafusion/src/physical_plan/aggregates.rs
rename rust/datafusion/src/{execution => }/physical_plan/common.rs (99%)
rename rust/datafusion/src/{execution => }/physical_plan/csv.rs (93%)
create mode 100644 rust/datafusion/src/physical_plan/datetime_expressions.rs
create mode 100644 rust/datafusion/src/physical_plan/empty.rs
rename rust/datafusion/src/{execution => }/physical_plan/explain.rs (73%)
rename rust/datafusion/src/{execution => }/physical_plan/expressions.rs (88%)
rename rust/datafusion/src/{execution => }/physical_plan/filter.rs (88%)
create mode 100644 rust/datafusion/src/physical_plan/functions.rs
rename rust/datafusion/src/{execution => }/physical_plan/hash_aggregate.rs
(92%)
rename rust/datafusion/src/{execution => }/physical_plan/limit.rs (64%)
create mode 100644 rust/datafusion/src/physical_plan/math_expressions.rs
rename rust/datafusion/src/{execution => }/physical_plan/memory.rs (88%)
rename rust/datafusion/src/{execution => }/physical_plan/merge.rs (81%)
rename rust/datafusion/src/{execution => }/physical_plan/mod.rs (78%)
rename rust/datafusion/src/{execution => }/physical_plan/parquet.rs (92%)
create mode 100644 rust/datafusion/src/physical_plan/planner.rs
rename rust/datafusion/src/{execution => }/physical_plan/projection.rs (83%)
rename rust/datafusion/src/{execution => }/physical_plan/sort.rs (52%)
create mode 100644 rust/datafusion/src/physical_plan/string_expressions.rs
create mode 100644 rust/datafusion/src/physical_plan/type_coercion.rs
create mode 100644 rust/datafusion/src/physical_plan/udf.rs
create mode 100644 rust/datafusion/src/test/variable.rs
copy rust/datafusion/src/{optimizer/optimizer.rs => variable/mod.rs} (65%)
create mode 100644 rust/datafusion/tests/customer.csv
create mode 100644 rust/datafusion/tests/user_defined_plan.rs