You want to use `Dataset.persist(StorageLevel.MEMORY_AND_DISK)`?

On Thu, Apr 12, 2018 at 1:12 PM, Louis Hust <louis.h...@gmail.com> wrote:

> We  want to extract data from mysql, and calculate in sparksql.
> The sql explain like below.
>
>
> REGIONKEY#177,N_COMMENT#178] PushedFilters: [], ReadSchema:
> struct<N_NATIONKEY:int,N_NAME:string,N_REGIONKEY:int,N_COMMENT:string>
>                   +- *(20) Sort [r_regionkey#203 ASC NULLS FIRST], false, 0
>                      +- Exchange(coordinator id: 266374831)
> hashpartitioning(r_regionkey#203, 200), coordinator[target post-shuffle
> partition size: 67108864]
>                         +- *(19) Project [R_REGIONKEY#203]
>                            +- *(19) Filter ((isnotnull(r_name#204) &&
> (r_name#204 = AFRICA)) && isnotnull(r_regionkey#203))
>                               +- InMemoryTableScan [R_REGIONKEY#203,
> r_name#204], [isnotnull(r_name#204), (r_name#204 = AFRICA),
> isnotnull(r_regionkey#203)]
>                                     +- InMemoryRelation [R_REGIONKEY#203,
> R_NAME#204, R_COMMENT#205], true, 10000, StorageLevel(disk, memory, 1
> replicas)
>                                           +- *(1) Scan
> JDBCRelation(region) [numPartitions=1] 
> [R_REGIONKEY#203,R_NAME#204,R_COMMENT#205]
> PushedFilters: [], ReadSchema: struct<R_REGIONKEY:int,R_NAME:
> string,R_COMMENT:string>
>
>
> As you see, all JDBCRelation convert to InMemoryRelation. Cause the JDBC
> table is so big, the all data can not be filled into memory,  OOM occurs.
> If there is some option to make SparkSQL use Disk if memory not enough?
>



-- 
---
Takeshi Yamamuro

Reply via email to