Issac Buenrostro created PARQUET-224: ----------------------------------------
Summary: Implement writing Parquet files into Cassandra natively Key: PARQUET-224 URL: https://issues.apache.org/jira/browse/PARQUET-224 Project: Parquet Issue Type: New Feature Reporter: Issac Buenrostro Priority: Minor Writing Parquet files into Cassandra could allow parallel writes of multiple pages into different cells, and low latency reads with a persistent connection to C*. Each page could be written to separate C* cells, with metadata written into a separate column family. A possible way of implementing is: - abstract ParquetFileWriter -> ParquetDataWriter. writeDictionaryPage, writeDataPage are abstract methods. - ParquetFileWriter implements ParquetDataWriter, writing the data to Hadoop compatible files. - ParquetCassandraWriter implements ParquetDataWriter, writing data to Cassandra -- for each page, metadata is written to Metadata CF, with key <parquet-file-name>:<row-chunk>:<column>:<page> -- for each page, data is written to Data CF, with key <parquet-file-name>:<row-chunk>:<column>:<page> -- footer is written to Metadata CF, with key <parquet-file-name> - abstract ParquetFileReader -> ParquetDataReader. readNextRowGroup, readFooter are abstract methods. Chunk will also need to be abstract. - ParquetFileReader implements ParquetDataReader, reading from Hadoop compatible files. - ParquetCassandraReader implements ParquetDataReader, reading from Cassandra - ParquetDataWriter and ParquetDataReader are instantiated through reflection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)