alamb commented on code in PR #4381: URL: https://github.com/apache/arrow-rs/pull/4381#discussion_r1222980792
########## arrow-array/src/array/boolean_array.rs: ########## @@ -27,51 +27,47 @@ use std::sync::Arc; /// An array of [boolean values](https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout) /// -/// # Examples +/// # Example: From a Vec /// -/// Construction +/// ``` +/// # use arrow_array::{Array, BooleanArray}; +/// let arr: BooleanArray = vec![true, true, false].into(); +/// let values: Vec<_> = arr.iter().collect(); +/// assert_eq!(&values, &[Some(true), Some(true), Some(false)]) Review Comment: I don't think this assert adds anything to the doc example: I think it is already clear from the docs what the array contains. ########## arrow-array/src/array/boolean_array.rs: ########## @@ -27,51 +27,47 @@ use std::sync::Arc; /// An array of [boolean values](https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout) /// -/// # Examples +/// # Example: From a Vec /// -/// Construction +/// ``` +/// # use arrow_array::{Array, BooleanArray}; +/// let arr: BooleanArray = vec![true, true, false].into(); +/// let values: Vec<_> = arr.iter().collect(); +/// assert_eq!(&values, &[Some(true), Some(true), Some(false)]) +/// ``` +/// +/// # Example: From an optional Vec /// /// ``` -///# use arrow_array::{Array, BooleanArray}; -/// // Create from Vec<Option<bool>> -/// let arr = BooleanArray::from(vec![Some(false), Some(true), None, Some(true)]); -/// // Create from Vec<bool> -/// let arr = BooleanArray::from(vec![false, true, true]); -/// // Create from iter/collect -/// let arr: BooleanArray = std::iter::repeat(Some(true)).take(10).collect(); +/// # use arrow_array::{Array, BooleanArray}; +/// let arr: BooleanArray = vec![Some(true), None, Some(false)].into(); +/// let values: Vec<_> = arr.iter().collect(); +/// assert_eq!(&values, &[Some(true), None, Some(false)]) /// ``` /// -/// Construction and Access +/// # Example: From an iterator /// /// ``` -/// use arrow_array::{Array, BooleanArray}; -/// let arr = BooleanArray::from(vec![Some(false), Some(true), None, Some(true)]); -/// assert_eq!(4, arr.len()); -/// assert_eq!(1, arr.null_count()); -/// assert!(arr.is_valid(0)); -/// assert!(!arr.is_null(0)); -/// assert_eq!(false, arr.value(0)); -/// assert!(!arr.is_valid(2)); -/// assert!(arr.is_null(2)); +/// # use arrow_array::{Array, BooleanArray}; +/// let arr: BooleanArray = (0..5).map(|x| (x % 2 == 0).then(|| x % 3 == 0)).collect(); +/// let values: Vec<_> = arr.iter().collect(); +/// assert_eq!(&values, &[Some(true), None, Some(false), None, Some(false)]) /// ``` /// -/// Using `collect` +/// # Example: Using Builder +/// /// ``` -/// use arrow_array::{Array, BooleanArray}; -/// let v = vec![Some(false), Some(true), Some(false), Some(true)]; -/// let arr = v.into_iter().collect::<BooleanArray>(); -/// assert_eq!(4, arr.len()); -/// assert_eq!(0, arr.offset()); -/// assert_eq!(0, arr.null_count()); -/// assert!(arr.is_valid(0)); -/// assert_eq!(false, arr.value(0)); -/// assert!(arr.is_valid(1)); -/// assert_eq!(true, arr.value(1)); -/// assert!(arr.is_valid(2)); -/// assert_eq!(false, arr.value(2)); -/// assert!(arr.is_valid(3)); -/// assert_eq!(true, arr.value(3)); +/// # use arrow_array::Array; +/// # use arrow_array::builder::BooleanBuilder; +/// let mut builder = BooleanBuilder::new(); +/// builder.append_value(true); +/// builder.append_null(); +/// builder.append_value(false); +/// let array = builder.finish(); +/// let values: Vec<_> = array.iter().collect(); +/// assert_eq!(&values, &[Some(true), None, Some(false)]) Review Comment: this assert adds value as it shows the equivalent form that is different than the builder's construction' ########## arrow-array/src/array/boolean_array.rs: ########## @@ -27,51 +27,47 @@ use std::sync::Arc; /// An array of [boolean values](https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout) /// -/// # Examples +/// # Example: From a Vec /// -/// Construction +/// ``` +/// # use arrow_array::{Array, BooleanArray}; +/// let arr: BooleanArray = vec![true, true, false].into(); +/// let values: Vec<_> = arr.iter().collect(); +/// assert_eq!(&values, &[Some(true), Some(true), Some(false)]) +/// ``` +/// +/// # Example: From an optional Vec /// /// ``` -///# use arrow_array::{Array, BooleanArray}; -/// // Create from Vec<Option<bool>> -/// let arr = BooleanArray::from(vec![Some(false), Some(true), None, Some(true)]); -/// // Create from Vec<bool> -/// let arr = BooleanArray::from(vec![false, true, true]); -/// // Create from iter/collect -/// let arr: BooleanArray = std::iter::repeat(Some(true)).take(10).collect(); +/// # use arrow_array::{Array, BooleanArray}; +/// let arr: BooleanArray = vec![Some(true), None, Some(false)].into(); +/// let values: Vec<_> = arr.iter().collect(); +/// assert_eq!(&values, &[Some(true), None, Some(false)]) /// ``` /// -/// Construction and Access +/// # Example: From an iterator Review Comment: this is a good example ########## arrow-array/src/array/boolean_array.rs: ########## @@ -27,51 +27,47 @@ use std::sync::Arc; /// An array of [boolean values](https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout) /// -/// # Examples +/// # Example: From a Vec /// -/// Construction +/// ``` +/// # use arrow_array::{Array, BooleanArray}; +/// let arr: BooleanArray = vec![true, true, false].into(); +/// let values: Vec<_> = arr.iter().collect(); Review Comment: This example seems unnecessarily complicated to me - why would we encourage this pattern instead of what was there ```rust let arr = BooleanArray::from(vec![true, true, false]) ``` ########## arrow-array/src/array/dictionary_array.rs: ########## @@ -30,116 +30,28 @@ use arrow_schema::{ArrowError, DataType}; use std::any::Any; use std::sync::Arc; -/// A dictionary array indexed by `i8` Review Comment: As I explained above, I think the duplication and specific example is very important to make this library to use for new users ########## arrow-array/src/array/byte_array.rs: ########## @@ -34,6 +34,52 @@ use std::sync::Arc; /// /// See [`BinaryArray`] and [`LargeBinaryArray`] for storing arbitrary bytes /// +/// # Example: From a Vec +/// +/// ``` +/// # use arrow_array::{Array, GenericByteArray, types::Utf8Type}; +/// let arr: GenericByteArray<Utf8Type> = vec!["hello", "world", ""].into(); Review Comment: I think this is not what a beginner user would use to create a `StringArray` -- while I realize doing it like this makes the code generic I think it makes figuring out how to make a `StringArray` significantly more complicated for a beginner ########## arrow-array/src/array/primitive_array.rs: ########## @@ -34,202 +34,40 @@ use half::f16; use std::any::Any; use std::sync::Arc; -/// An array of `i8` Review Comment: I think removing the specific examples is a significant reduction in UX as explained above ########## arrow/src/lib.rs: ########## @@ -317,6 +285,36 @@ //! assert_eq!(string.value(1), "foo"); //! ``` //! +//! # Crate Topology Review Comment: I think this is a good change ########## arrow-array/src/array/list_array.rs: ########## @@ -57,6 +57,8 @@ impl OffsetSizeTrait for i64 { /// An array of [variable length arrays](https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout) /// /// See [`ListArray`] and [`LargeListArray`]` +/// +/// See [`GenericListBuilder`](crate::builder::GenericListBuilder) for how to construct a [`GenericListArray`] Review Comment: I think this makes sense 👍 ########## arrow-array/src/array/binary_array.rs: ########## @@ -217,84 +217,10 @@ where } } -/// An array of `[u8]` using `i32` offsets -/// -/// The byte length of each element is represented by an i32. Review Comment: I think the duplication is worth the improved usability in this case and here is why: Imagine you are a new users who wants to use something like `DataFusion` to process your data and you figure out "ok, I need a `BinaryArray`" Today, if you go to the docs https://docs.rs/arrow/latest/arrow/array/type.BinaryArray.html you'll get an example you can copy/paste/modify and you are on your way. If instead the documentation simply says ``` /// A [`GenericBinaryArray`] of `[u8]` using `i32` offsets ``` Now you need to click into `GenericByteArray` which will have examples of how to create a `StringArray` (after this PR) https://github.com/apache/arrow-rs/pull/4381/files#diff-6d2cfda31352113d2d964b2e4a3bfa99831739f316f15a92bd77657ca4496c9bR37 Now I have to draw the connection that I can replace `&str` with `u8` and everything is fine. If I know / have experience with arrow this is fine. If I am a new user my initial experience has become significantly more complicated (though perhaps I have learned about how arrow represents binary arrays and string arrays) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
