Re: Parquet-like partitioning support in spark SQL's in-memory columnar cache

Nitin Goyal Mon, 28 Nov 2016 22:29:48 -0800

+Cheng

Hi Reynold,


I think you are referring to bucketing in in-memory columnar cache.

I am proposing that if we have a parquet structure like following :-

/<parent-directory>/file1/id=1/<parquet-part-files>
/<parent-directory>/file1/id=2/<parquet-part-files>

and if we read and cache it, it should create 2 RDD[CachedBatch] (each per
value of "id")

Is this what you were refering to originally?

Thanks
-Nitin


On Fri, Nov 25, 2016 at 11:29 AM, Reynold Xin <r...@databricks.com> wrote:

> It's already there isn't it? The in-memory columnar cache format.
>
>
> On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal <nitin2go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Do we have any plan of supporting parquet-like partitioning support in
>> Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
>> in-memory cache partition.
>>
>>
>> -Nitin
>>
>
>


-- 
Regards
Nitin Goyal

Re: Parquet-like partitioning support in spark SQL's in-memory columnar cache

Reply via email to