Hi all!
I program in Java and I use PARQUET with HADOOP because I need to write/read
to/from hdfs. I'm a bit confused because of the contradiction between the
columnar nature of PARQUET and the ParquetReader/Writer in version 1.9.0 of
parquet-hadoop from org.apache.parquetand version 1.6.0 of parquet-hadoop
from com.twitter.
They require to write line by line even if I had the columns at hand:
Iterator<Valuet> itr = theValues.iterator();
while (itr.hasNext()) {
writer.write(groupFromValue(itr.next()));
}
writer.close();
Did I fail to notice a package or function? Is there a way to write columns
directly?
If not: Could please anybody explain the contradiction between the columnar
nature of PARQUET and the row by rowbased read/write stratagy.
Is it for technical reasons, perhapsbecause of some requirements of the record
shredding and assembly algorithm?
An URL would suffice.
Thank you in advance
Joerg