[ https://issues.apache.org/jira/browse/PARQUET-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393776#comment-14393776 ]
Julien Le Dem commented on PARQUET-224: --------------------------------------- That sounds doable. You'd create a new PageStore: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/test/java/parquet/column/page/mem/MemPageStore.java example here of using parquet assembly independently of a file: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/test/java/parquet/io/TestColumnIO.java > Implement writing Parquet files into Cassandra natively > ------------------------------------------------------- > > Key: PARQUET-224 > URL: https://issues.apache.org/jira/browse/PARQUET-224 > Project: Parquet > Issue Type: New Feature > Reporter: Issac Buenrostro > Priority: Minor > > Writing Parquet files into Cassandra could allow parallel writes of multiple > pages into different cells, and low latency reads with a persistent > connection to C*. > Each page could be written to separate C* cells, with metadata written into a > separate column family. > A possible way of implementing is: > - abstract ParquetFileWriter -> ParquetDataWriter. writeDictionaryPage, > writeDataPage are abstract methods. > - ParquetFileWriter implements ParquetDataWriter, writing the data to Hadoop > compatible files. > - ParquetCassandraWriter implements ParquetDataWriter, writing data to > Cassandra > -- for each page, metadata is written to Metadata CF, with key > <parquet-file-name>:<row-chunk>:<column>:<page> > -- for each page, data is written to Data CF, with key > <parquet-file-name>:<row-chunk>:<column>:<page> > -- footer is written to Metadata CF, with key <parquet-file-name> > - abstract ParquetFileReader -> ParquetDataReader. readNextRowGroup, > readFooter are abstract methods. Chunk will also need to be abstract. > - ParquetFileReader implements ParquetDataReader, reading from Hadoop > compatible files. > - ParquetCassandraReader implements ParquetDataReader, reading from Cassandra > - ParquetDataWriter and ParquetDataReader are instantiated through reflection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)