Hi all!
I program in Java and I use PARQUET with HADOOP because I need to write/read 
to/from hdfs.  I'm a bit confused because of the contradiction between the 
columnar nature of PARQUET and the ParquetReader/Writer in version 1.9.0 of 
parquet-hadoop  from org.apache.parquetand version 1.6.0 of parquet-hadoop  
from com.twitter. 
They require to write line by line even if I had the columns at hand:
Iterator<Valuet> itr = theValues.iterator();
while (itr.hasNext()) {
            writer.write(groupFromValue(itr.next()));
}
writer.close();
Did I fail to notice a package or function?  Is there a way to write columns 
directly?
If not: Could please anybody explain the contradiction between the columnar 
nature of PARQUET and the row by rowbased read/write stratagy. 

Is it for technical reasons, perhapsbecause of some requirements of  the record 
shredding and assembly algorithm?
An URL would suffice.
Thank you in advance
Joerg

Reply via email to