Hi all! I program in Java and I use PARQUET with HADOOP because I need to write/read to/from hdfs. I'm a bit confused because of the contradiction between the columnar nature of PARQUET and the ParquetReader/Writer in version 1.9.0 of parquet-hadoop from org.apache.parquetand version 1.6.0 of parquet-hadoop from com.twitter. They require to write line by line even if I had the columns at hand: Iterator<Valuet> itr = theValues.iterator(); while (itr.hasNext()) { writer.write(groupFromValue(itr.next())); } writer.close(); Did I fail to notice a package or function? Is there a way to write columns directly? If not: Could please anybody explain the contradiction between the columnar nature of PARQUET and the row by rowbased read/write stratagy.
Is it for technical reasons, perhapsbecause of some requirements of the record shredding and assembly algorithm? An URL would suffice. Thank you in advance Joerg