Hi all!
I have a general question concerning PARQUET.
PARQUET is a columnar store. But the typical Apache PARQUET Writer/Reader loops
use a row by row strategy:
Iterator<Valuet> itr = theValues.iterator();
while (itr.hasNext()) {
writer.write(groupFromValue(itr.next()));
}
writer.close();
Assume I had the columns at hand. This procedure requires to convert them into
rows. Is there a way to write columns directly?If not: Could please anybody
explain the contradiction between the columnar nature of PARQUET and a the row
by rowbased read/write stratagy.
Is it for technical reasons, perhapsbecause of some requirements of the record
shredding and assembly algorithm?
An URL would suffice.
Thank you in advance
Joerg