It is not a good practice to do this. Just store a reference to the binary data 
stored on HDFS.

> Am 09.01.2022 um 15:34 schrieb weoccc <weo...@gmail.com>:
> 
> 
> Hi ,
> 
> I want to store binary data (such as images) into hive table but the binary 
> data column might be much larger than other columns per row.  I'm worried 
> about the query performance. One way I can think of is to separate binary 
> data storage from other columns by creating 2 hive tables and run 2 separate 
> spark query and join them later. 
> 
> Later, I found parquet has supported column split into different files as 
> shown here: 
> https://parquet.apache.org/documentation/latest/
> 
> I'm wondering if spark sql already supports that ? If so, how to use ? 
> 
> Weide 

Reply via email to