[ 
https://issues.apache.org/jira/browse/SPARK-48587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cashman updated SPARK-48587:
----------------------------------
    Description: When a variant_get expression returns a Variant, or a nested 
type containing Variant, we just return the sub-slice of the Variant value 
along with the full metadata, even though most of the metadata is probably 
unnecessary to represent the value. This may be very inefficient if the value 
is then written to disk (e.g. shuffle file or parquet). We should instead 
rebuild the value with minimal metadata.  (was: When a variant_get expression 
returns a Variant, or a nested type containing Variant, we just return the 
sub-slice of the Variant value along with the full metadata, even though most 
of the metadata is probably unnecessary to represent the value. We should 
instead rebuild the value with minimal metadata.)

> Avoid storage amplification when accessing sub-Variant
> ------------------------------------------------------
>
>                 Key: SPARK-48587
>                 URL: https://issues.apache.org/jira/browse/SPARK-48587
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: David Cashman
>            Priority: Major
>
> When a variant_get expression returns a Variant, or a nested type containing 
> Variant, we just return the sub-slice of the Variant value along with the 
> full metadata, even though most of the metadata is probably unnecessary to 
> represent the value. This may be very inefficient if the value is then 
> written to disk (e.g. shuffle file or parquet). We should instead rebuild the 
> value with minimal metadata.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to