It is not a good practice to do this. Just store a reference to the binary data
stored on HDFS.
> Am 09.01.2022 um 15:34 schrieb weoccc :
>
>
> Hi ,
>
> I want to store binary data (such as images) into hive table but the binary
> data column might be much larger than other columns per row. I'm worried
> about the query performance. One way I can think of is to separate binary
> data storage from other columns by creating 2 hive tables and run 2 separate
> spark query and join them later.
>
> Later, I found parquet has supported column split into different files as
> shown here:
> https://parquet.apache.org/documentation/latest/
>
> I'm wondering if spark sql already supports that ? If so, how to use ?
>
> Weide