shanesveller opened a new issue, #6384:
URL: https://github.com/apache/arrow-rs/issues/6384

   **Describe the bug**
   As of Arrow 53.0.0 as well as on mainline commit 
f050ff7a1946858d500b21a2c7d1b2c24fc2c753, I can cause a panic by reusing a 
`StringViewBuilder` that has deduplication enabled if I append a 
previously-observed value after having called `StringViewBuilder::finish` 
instead of `finish_cloned`. 
   
   **To Reproduce**
   
   ```rust
   let value_1 = "long string to test string view";
   builder.append_value(value_1);
   let _array = builder.finish();
   builder.append_value(value_1);
   let _array = builder.finish();
   ```
   
   **Expected behavior**
   
   `StringViewBuilder::finish()` should leave me with a clean state with which 
I may continue to append new values, even if they have been previously observed 
**by this builder**.
   
   **Additional context**
   
   <details><summary>Possible patch, pending maintainer feedback in the PR I 
will open with the same</summary>
   
   ```diff
   diff --git a/arrow-array/src/builder/generic_bytes_view_builder.rs 
b/arrow-array/src/builder/generic_bytes_view_builder.rs
   index deaf447d..3a9cf17c 100644
   --- a/arrow-array/src/builder/generic_bytes_view_builder.rs
   +++ b/arrow-array/src/builder/generic_bytes_view_builder.rs
   @@ -368,6 +368,9 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> 
{
            let len = self.views_builder.len();
            let views = ScalarBuffer::new(self.views_builder.finish(), 0, len);
            let nulls = self.null_buffer_builder.finish();
   +        if let Some((ref mut ht, _)) = self.string_tracker.as_mut() {
   +            ht.clear();
   +        }
            // SAFETY: valid by construction
            unsafe { GenericByteViewArray::new_unchecked(views, completed, 
nulls) }
        }
   @@ -590,6 +593,20 @@ mod tests {
            assert_eq!(array.views().get(1), array.views().get(5));
        }
    
   +    #[test]
   +    fn test_string_view_deduplicate_after_finish() {
   +        let mut builder = 
StringViewBuilder::new().with_deduplicate_strings();
   +
   +        let value_1 = "long string to test string view";
   +        let value_2 = "not so similar string but long";
   +        builder.append_value(value_1);
   +        let _array = builder.finish();
   +        builder.append_value(value_2);
   +        let _array = builder.finish();
   +        builder.append_value(value_1);
   +        let _array = builder.finish();
   +    }
   +
        #[test]
        fn test_string_view() {
            let b1 = Buffer::from(b"world\xFFbananas\xF0\x9F\x98\x81");
   ```
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to