Re: [ compress in-memory column storage used in sparksql cache table ]

Cheng Lian Wed, 02 Sep 2015 01:45:10 -0700

Yeah, two of the reasons why the built-in in-memory columnar storagedoesn't achieve comparable compression ratio as Parquet are:

1. The in-memory columnar representation doesn't handle nested types. Soarray/map/struct values are not compressed.2. Parquet may use more than one kind of compression methods to compressa single column. For example, dictionary + RLE.


Cheng

On 9/2/15 3:58 PM, Nitin Goyal wrote:

I think spark sql's in-memory columnar cache already does compression. Check
out classes in following path :-

https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/compression

Although compression ratio is not as good as Parquet.

Thanks
-Nitin



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/compress-in-memory-column-storage-used-in-sparksql-cache-table-tp13932p13937.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [ compress in-memory column storage used in sparksql cache table ]

Reply via email to