RE: AVRO File size when caching in-memory

2016-11-16 Thread Shreya Agarwal
.@microsoft.com>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: AVRO File size when caching in-memory It's something like the schema shown below (with several additional levels/sublevels) root |-- sentAt: long (nullable = true) |-- sharing: string (nullable = tru

Re: AVRO File size when caching in-memory

2016-11-16 Thread Prithish
nd how well it was compressable. >>> >>> >>> >>> The purpose of these formats is to store data to persistent storage in a >>> way that's faster to read from, not to reduce cache-memory usage. >>> >>> >>> >>>

Re: AVRO File size when caching in-memory

2016-11-16 Thread Takeshi Yamamuro
faster to read from, not to reduce cache-memory usage. >> >> >> >> Maybe others here have more info to share. >> >> >> >> Regards, >> >> Shreya >> >> >> >> Sent from my Windows 10 phone >> >> >> >&g

Re: AVRO File size when caching in-memory

2016-11-16 Thread Prithish
be others here have more info to share. > > > > Regards, > > Shreya > > > > Sent from my Windows 10 phone > > > > *From: *Prithish <prith...@gmail.com> > *Sent: *Tuesday, November 15, 2016 11:04 PM > *To: *Shreya Agarwal <shrey...@microsoft.

RE: AVRO File size when caching in-memory

2016-11-15 Thread Shreya Agarwal
, Shreya Sent from my Windows 10 phone From: Prithish<mailto:prith...@gmail.com> Sent: Tuesday, November 15, 2016 11:04 PM To: Shreya Agarwal<mailto:shrey...@microsoft.com> Subject: Re: AVRO File size when caching in-memory I did another test and noting my observations here. The

Re: AVRO File size when caching in-memory

2016-11-15 Thread Prithish
Anyone? On Tue, Nov 15, 2016 at 10:45 AM, Prithish wrote: > I am using 2.0.1 and databricks avro library 3.0.1. I am running this on > the latest AWS EMR release. > > On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke wrote: > >> spark version? Are you using

Re: AVRO File size when caching in-memory

2016-11-14 Thread Prithish
I am using 2.0.1 and databricks avro library 3.0.1. I am running this on the latest AWS EMR release. On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke wrote: > spark version? Are you using tungsten? > > > On 14 Nov 2016, at 10:05, Prithish wrote: > > > >

Re: AVRO File size when caching in-memory

2016-11-14 Thread Jörn Franke
spark version? Are you using tungsten? > On 14 Nov 2016, at 10:05, Prithish wrote: > > Can someone please explain why this happens? > > When I read a 600kb AVRO file and cache this in memory (using cacheTable), it > shows up as 11mb (storage tab in Spark UI). I have tried