Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package apache-arrow for openSUSE:Factory checked in at 2023-11-14 21:42:29 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/apache-arrow (Old) and /work/SRC/openSUSE:Factory/.apache-arrow.new.17445 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "apache-arrow" Tue Nov 14 21:42:29 2023 rev:6 rq:1125775 version:14.0.1 Changes: -------- --- /work/SRC/openSUSE:Factory/apache-arrow/apache-arrow.changes 2023-09-08 21:17:01.769662402 +0200 +++ /work/SRC/openSUSE:Factory/.apache-arrow.new.17445/apache-arrow.changes 2023-11-14 21:42:33.447564693 +0100 @@ -1,0 +2,10 @@ +Mon Nov 13 23:51:00 UTC 2023 - OndÅej Súkup <[email protected]> + +- update 14.0.1 + * GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests + * GH-38607 - [Python] Disable PyExtensionType autoload +- update to 14.0.1 + * very long list of changes can be found here: + https://arrow.apache.org/release/14.0.0.html + +------------------------------------------------------------------- Old: ---- apache-arrow-13.0.0.tar.gz arrow-testing-13.0.0.tar.gz parquet-testing-13.0.0.tar.gz New: ---- apache-arrow-14.0.1.tar.gz arrow-testing-14.0.1.tar.gz parquet-testing-14.0.1.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ apache-arrow.spec ++++++ --- /var/tmp/diff_new_pack.1s1C5C/_old 2023-11-14 21:42:34.499603636 +0100 +++ /var/tmp/diff_new_pack.1s1C5C/_new 2023-11-14 21:42:34.503603785 +0100 @@ -20,13 +20,13 @@ # Required for runtime dispatch, not yet packaged %bcond_with xsimd -%define sonum 1300 +%define sonum 1400 # See git submodule /testing pointing to the correct revision %define arrow_testing_commit 47f7b56b25683202c1fd957668e13f2abafc0f12 # See git submodule /cpp/submodules/parquet-testing pointing to the correct revision -%define parquet_testing_commit b2e7cc755159196e3a068c8594f7acbaecfdaaac +%define parquet_testing_commit e45cd23f784aab3d6bf0701f8f4e621469ed3be7 Name: apache-arrow -Version: 13.0.0 +Version: 14.0.1 Release: 0 Summary: A development platform for in-memory data License: Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT ++++++ apache-arrow-13.0.0.tar.gz -> apache-arrow-14.0.1.tar.gz ++++++ /work/SRC/openSUSE:Factory/apache-arrow/apache-arrow-13.0.0.tar.gz /work/SRC/openSUSE:Factory/.apache-arrow.new.17445/apache-arrow-14.0.1.tar.gz differ: char 18, line 1 ++++++ arrow-testing-13.0.0.tar.gz -> arrow-testing-14.0.1.tar.gz ++++++ ++++ no output (probably identical) ++++++ parquet-testing-13.0.0.tar.gz -> parquet-testing-14.0.1.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/parquet-testing-b2e7cc755159196e3a068c8594f7acbaecfdaaac/data/README.md new/parquet-testing-e45cd23f784aab3d6bf0701f8f4e621469ed3be7/data/README.md --- old/parquet-testing-b2e7cc755159196e3a068c8594f7acbaecfdaaac/data/README.md 2023-03-06 12:29:32.000000000 +0100 +++ new/parquet-testing-e45cd23f784aab3d6bf0701f8f4e621469ed3be7/data/README.md 2023-09-25 18:06:41.000000000 +0200 @@ -44,6 +44,7 @@ | rle-dict-snappy-checksum.parquet | compressed and dictionary-encoded INT32 and STRING columns in format v2 with a matching CRC | | plain-dict-uncompressed-checksum.parquet | uncompressed and dictionary-encoded INT32 and STRING columns in format v1 with a matching CRC | | rle-dict-uncompressed-corrupt-checksum.parquet | uncompressed and dictionary-encoded INT32 and STRING columns in format v2 with a mismatching CRC | +| large_string_map.brotli.parquet | MAP(STRING, INT32) with a string column chunk of more than 2GB. See [note](#large-string-map) below | TODO: Document what each file is in the table above. @@ -56,14 +57,15 @@ https://github.com/apache/parquet-format/blob/encryption/Encryption.md ``` -Following are the keys and key ids (when using key\_retriever) used to encrypt the encrypted columns and footer in the all the encrypted files: +Following are the keys and key ids (when using key\_retriever) used to encrypt +the encrypted columns and footer in all the encrypted files: * Encrypted/Signed Footer: * key: {0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5} * key_id: "kf" -* Encrypted column named double_field: +* Encrypted column named double_field (including column and offset index): * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,0} * key_id: "kc1" -* Encrypted column named float_field: +* Encrypted column named float_field (including column and offset index): * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,1} * key_id: "kc2" @@ -72,10 +74,11 @@ 2. encrypt\_columns\_and\_footer\_aad.parquet.encrypted -A sample that reads and checks these files can be found at the following tests: +A sample that reads and checks these files can be found at the following tests +in Parquet C++: ``` -cpp/src/parquet/encryption-read-configurations-test.cc -cpp/src/parquet/test-encryption-util.h +cpp/src/parquet/encryption/read-configurations-test.cc +cpp/src/parquet/encryption/test-encryption-util.h ``` The `external_key_material_java.parquet.encrypted` file was encrypted using parquet-mr with @@ -202,3 +205,21 @@ # total_compressed_size: 84 # total_uncompressed_size: 80 ``` + +## Large string map + +The file `large_string_map.brotli.parquet` was generated with: +```python +import pyarrow as pa +import pyarrow.parquet as pq + +arr = pa.array([[("a" * 2**30, 1)]], type = pa.map_(pa.string(), pa.int32())) +arr = pa.chunked_array([arr, arr]) +tab = pa.table({ "arr": arr }) + +pq.write_table(tab, "test.parquet", compression='BROTLI') +``` + +It is meant to exercise reading of structured data where each value +is smaller than 2GB but the combined uncompressed column chunk size +is greater than 2GB. Binary files old/parquet-testing-b2e7cc755159196e3a068c8594f7acbaecfdaaac/data/encrypt_columns_and_footer.parquet.encrypted and new/parquet-testing-e45cd23f784aab3d6bf0701f8f4e621469ed3be7/data/encrypt_columns_and_footer.parquet.encrypted differ Binary files old/parquet-testing-b2e7cc755159196e3a068c8594f7acbaecfdaaac/data/encrypt_columns_and_footer_aad.parquet.encrypted and new/parquet-testing-e45cd23f784aab3d6bf0701f8f4e621469ed3be7/data/encrypt_columns_and_footer_aad.parquet.encrypted differ Binary files old/parquet-testing-b2e7cc755159196e3a068c8594f7acbaecfdaaac/data/encrypt_columns_and_footer_ctr.parquet.encrypted and new/parquet-testing-e45cd23f784aab3d6bf0701f8f4e621469ed3be7/data/encrypt_columns_and_footer_ctr.parquet.encrypted differ Binary files old/parquet-testing-b2e7cc755159196e3a068c8594f7acbaecfdaaac/data/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted and new/parquet-testing-e45cd23f784aab3d6bf0701f8f4e621469ed3be7/data/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted differ Binary files old/parquet-testing-b2e7cc755159196e3a068c8594f7acbaecfdaaac/data/encrypt_columns_plaintext_footer.parquet.encrypted and new/parquet-testing-e45cd23f784aab3d6bf0701f8f4e621469ed3be7/data/encrypt_columns_plaintext_footer.parquet.encrypted differ Binary files old/parquet-testing-b2e7cc755159196e3a068c8594f7acbaecfdaaac/data/large_string_map.brotli.parquet and new/parquet-testing-e45cd23f784aab3d6bf0701f8f4e621469ed3be7/data/large_string_map.brotli.parquet differ Binary files old/parquet-testing-b2e7cc755159196e3a068c8594f7acbaecfdaaac/data/uniform_encryption.parquet.encrypted and new/parquet-testing-e45cd23f784aab3d6bf0701f8f4e621469ed3be7/data/uniform_encryption.parquet.encrypted differ
