Thanks for the email. Can you explain what the difference is between this and existing formats such as Parquet/ORC?
On Wed, Nov 11, 2015 at 4:59 AM, Cristian O <[email protected] > wrote: > Hi, > > I was wondering if there's any planned support for local disk columnar > storage. > > This could be an extension of the in-memory columnar store, or possibly > something similar to the recently added local checkpointing for RDDs > > This could also have the added benefit of enabling iterative usage for > DataFrames by pruning the query plan through local checkpoints. > > A further enhancement would be to add update support to the columnar > format (in the immutable copy-on-write sense of course), by maintaining > references to unchanged row blocks and only copying and mutating the ones > that have changed. > > A use case here is streaming and merging updates in a large dataset that > can be efficiently stored internally in a columnar format, rather than > accessing a more inefficient external data store like HDFS or Cassandra. > > Thanks, > Cristian >
