[jira] [Commented] (PARQUET-224) Implement writing Parquet files into Cassandra natively

Julien Le Dem (JIRA) Thu, 02 Apr 2015 17:09:06 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393776#comment-14393776
 ]


Julien Le Dem commented on PARQUET-224:
---------------------------------------

That sounds doable.
You'd create a new PageStore:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/test/java/parquet/column/page/mem/MemPageStore.java

example here of using parquet assembly independently of a file:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/test/java/parquet/io/TestColumnIO.java

> Implement writing Parquet files into Cassandra natively
> -------------------------------------------------------
>
>                 Key: PARQUET-224
>                 URL: https://issues.apache.org/jira/browse/PARQUET-224
>             Project: Parquet
>          Issue Type: New Feature
>            Reporter: Issac Buenrostro
>            Priority: Minor
>
> Writing Parquet files into Cassandra could allow parallel writes of multiple 
> pages into different cells, and low latency reads with a persistent 
> connection to C*.
> Each page could be written to separate C* cells, with metadata written into a 
> separate column family.
> A possible way of implementing is:
> - abstract ParquetFileWriter -> ParquetDataWriter. writeDictionaryPage, 
> writeDataPage are abstract methods.
> - ParquetFileWriter implements ParquetDataWriter, writing the data to Hadoop 
> compatible files.
> - ParquetCassandraWriter implements ParquetDataWriter, writing data to 
> Cassandra
> -- for each page, metadata is written to Metadata CF, with key 
> <parquet-file-name>:<row-chunk>:<column>:<page>
> -- for each page, data is written to Data CF, with key 
> <parquet-file-name>:<row-chunk>:<column>:<page>
> -- footer is written to Metadata CF, with key <parquet-file-name>
> - abstract ParquetFileReader -> ParquetDataReader. readNextRowGroup, 
> readFooter are abstract methods. Chunk will also need to be abstract.
> - ParquetFileReader implements ParquetDataReader, reading from Hadoop 
> compatible files.
> - ParquetCassandraReader implements ParquetDataReader, reading from Cassandra
> - ParquetDataWriter and ParquetDataReader are instantiated through reflection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-224) Implement writing Parquet files into Cassandra natively

Reply via email to