XiangpengHao commented on code in PR #5885:
URL: https://github.com/apache/arrow-rs/pull/5885#discussion_r1639101720


##########
arrow-array/src/array/byte_view_array.rs:
##########
@@ -265,6 +265,56 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
             phantom: Default::default(),
         }
     }
+
+    /// Returns a buffer compact version of this array
+    ///
+    /// The original array will *not* be modified
+    ///
+    /// # Garbage Collection
+    ///
+    /// Before GC:
+    /// ```text
+    ///                                        ┌──────┐                 
+    ///                                        │......│                 
+    ///                                        │......│                 
+    /// ┌────────────────────┐       ┌ ─ ─ ─ ▶ │Data1 │   Large buffer  
+    /// │       View 1       │─ ─ ─ ─          │......│  with data that
+    /// ├────────────────────┤                 │......│ is not referred
+    /// │       View 2       │─ ─ ─ ─ ─ ─ ─ ─▶ │Data2 │ to by View 1 or
+    /// └────────────────────┘                 │......│      View 2     
+    ///                                        │......│                 
+    ///    2 views, refer to                   │......│                 
+    ///   small portions of a                  └──────┘                 
+    ///      large buffer                                               
+    /// ```
+    ///                                                                
+    /// After GC:
+    ///
+    /// ```text
+    /// ┌────────────────────┐                 ┌─────┐    After gc, only
+    /// │       View 1       │─ ─ ─ ─ ─ ─ ─ ─▶ │Data1│     data that is  
+    /// ├────────────────────┤       ┌ ─ ─ ─ ▶ │Data2│    pointed to by  
+    /// │       View 2       │─ ─ ─ ─          └─────┘     the views is  
+    /// └────────────────────┘                                 left      
+    ///                                                                  
+    ///                                                                  
+    ///         2 views                                                  
+    /// ```
+    /// This method will compact the data buffers by recreating the view array 
and only include the data
+    /// that is pointed to by the views.
+    ///
+    /// Note that it will copy the array regardless of whether the original 
array is compact.
+    /// Use with caution as this can be an expensive operation, only use it 
when you are sure that the view
+    /// array is significantly smaller than when it is originally created, 
e.g., after filtering or slicing.
+    pub fn gc(&self) -> Self {
+        let mut builder = 
GenericByteViewBuilder::<T>::with_capacity(self.len());
+
+        for v in self.iter() {
+            builder.append_option(v);

Review Comment:
   I'm not sure I understand how the checks can be avoided; even if the view is 
inlined, we need to change the offset and the block id, so we need to copy that 
u128 from the original view buffer to the new buffer anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to