Issac Buenrostro created PARQUET-224:
----------------------------------------

             Summary: Implement writing Parquet files into Cassandra natively
                 Key: PARQUET-224
                 URL: https://issues.apache.org/jira/browse/PARQUET-224
             Project: Parquet
          Issue Type: New Feature
            Reporter: Issac Buenrostro
            Priority: Minor


Writing Parquet files into Cassandra could allow parallel writes of multiple 
pages into different cells, and low latency reads with a persistent connection 
to C*.

Each page could be written to separate C* cells, with metadata written into a 
separate column family.

A possible way of implementing is:

- abstract ParquetFileWriter -> ParquetDataWriter. writeDictionaryPage, 
writeDataPage are abstract methods.
- ParquetFileWriter implements ParquetDataWriter, writing the data to Hadoop 
compatible files.
- ParquetCassandraWriter implements ParquetDataWriter, writing data to Cassandra
-- for each page, metadata is written to Metadata CF, with key 
<parquet-file-name>:<row-chunk>:<column>:<page>
-- for each page, data is written to Data CF, with key 
<parquet-file-name>:<row-chunk>:<column>:<page>
-- footer is written to Metadata CF, with key <parquet-file-name>

- abstract ParquetFileReader -> ParquetDataReader. readNextRowGroup, readFooter 
are abstract methods. Chunk will also need to be abstract.
- ParquetFileReader implements ParquetDataReader, reading from Hadoop 
compatible files.
- ParquetCassandraReader implements ParquetDataReader, reading from Cassandra

- ParquetDataWriter and ParquetDataReader are instantiated through reflection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to