Did you also set the row group size? It looks like this row group is ~103MB, which doesn't make sense with your block size (unless I'm reading the output wrong). I'm not really sure how much block size would matter either. The row group will only get processed by a single task even if there are multiple "HDFS" blocks covering it.

How did you arrive at 16KB for page size?

rb

On 04/03/2015 09:52 AM, Eugen Cepoi wrote:
Here is one of the results. It is for the execution with the config I was
expecting to perform the best based on my sampled data.

Compression: LZO, Page size and dictionary size: 16KB, block size 32 MB,
there are 32 parts of a total 911M on S3 (so a single file is in fact less
than 32mb). I am not sure that the block size actually matters so much as
the data is on S3 and not hdfs... :(

When I just get all the fields it is much worse than with raw thrift. If I
select one nested field (foo/** where foo has only 2 leafs) and a few
direct leafs then performance is similar to getting all without any filter.
When selecting only ~5 leafs performance is similar to raw thrift.

Thanks!


row group 1:        RC:283052 TS:107919094 OFFSET:4
--------------------------------------------------------------------------------
a:           INT64 LZO DO:0 FPO:4 SZ:365710/2213388/6,05 VC:283052
ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED
b:  INT64 LZO DO:0 FPO:365714 SZ:505835/2228766/4,41 VC:283052
ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED
c:                  BINARY LZO DO:0 FPO:871549 SZ:10376384/11393987/1,10
VC:283052 ENC:PLAIN,BIT_PACKED
d:           BINARY LZO DO:0 FPO:11247933 SZ:70986/78575/1,11 VC:283052
ENC:PLAIN_DICTIONARY,BIT_PACKED
e:     BINARY LZO DO:0 FPO:11318919 SZ:2159/2603/1,21 VC:283052
ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
f:       BINARY LZO DO:0 FPO:11321078 SZ:41917/47856/1,14 VC:283052
ENC:PLAIN,BIT_PACKED,RLE
g:
.g1:              BINARY LZO DO:0 FPO:11362995 SZ:38549/37372/0,97
VC:283052 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
.g2:
..g21:         INT64 LZO DO:0 FPO:11401544 SZ:61882/388906/6,28 VC:283052
ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
..g22:          BINARY LZO DO:0 FPO:11463426 SZ:1144390/7158351/6,26
VC:283052 ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
h:
.h1:              BINARY LZO DO:0 FPO:12607816 SZ:63896/68688/1,07
VC:283052 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
.h2:
..h21:         INT64 LZO DO:0 FPO:12671712 SZ:1169087/2207025/1,89
VC:283052 ENC:PLAIN,BIT_PACKED,RLE
..h22:          BINARY LZO DO:0 FPO:13840799 SZ:29116/40513/1,39 VC:283052
ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
i:
.i1:              BINARY LZO DO:0 FPO:13869915 SZ:10933/13648/1,25
VC:283052 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
.i2:
..i21:         INT64 LZO DO:0 FPO:13880848 SZ:11523/17795/1,54 VC:283052
ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
..i22:          BINARY LZO DO:0 FPO:13892371 SZ:135510/248827/1,84
VC:283052 ENC:PLAIN,BIT_PACKED,RLE
j:
.j1:              BINARY LZO DO:0 FPO:14027881 SZ:37025/35497/0,96
VC:283052 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
.j2:
..j21:         INT64 LZO DO:0 FPO:14064906 SZ:28196/37242/1,32 VC:283052
ENC:PLAIN,BIT_PACKED,RLE
..j22:          BINARY LZO DO:0 FPO:14093102 SZ:945481/6491450/6,87
VC:283052 ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
k:             BINARY LZO DO:0 FPO:15038583 SZ:39147/36673/0,94 VC:283052
ENC:PLAIN_DICTIONARY,BIT_PACKED
l:          BINARY LZO DO:0 FPO:15077730 SZ:58233/60236/1,03 VC:283052
ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
m:            BINARY LZO DO:0 FPO:15135963 SZ:28326/30663/1,08 VC:283052
ENC:PLAIN_DICTIONARY,BIT_PACKED
n:           BINARY LZO DO:0 FPO:15164289 SZ:2223225/26327896/11,84
VC:283052 ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
o:            BINARY LZO DO:0 FPO:17387514 SZ:690400/4470368/6,48 VC:283052
ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
p:          BINARY LZO DO:0 FPO:18077914 SZ:39/27/0,69 VC:283052
ENC:PLAIN,BIT_PACKED,RLE
q:           BINARY LZO DO:0 FPO:18077953 SZ:1099508/7582263/6,90 VC:283052
ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
r:           BINARY LZO DO:0 FPO:19177461 SZ:1372666/8752125/6,38 VC:283052
ENC:PLAIN_DICTIONARY,PLAIN,BIT_PACKED,RLE
s:              BINARY LZO DO:0 FPO:20550127 SZ:52878/51840/0,98 VC:283052
ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
t:          BINARY LZO DO:0 FPO:20603005 SZ:51548/49339/0,96 VC:283052
ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
u:
.map:
..key:               BINARY LZO DO:0 FPO:20654553 SZ:75794/85569/1,13
VC:291795 ENC:PLAIN_DICTIONARY,RLE
..value:             BINARY LZO DO:0 FPO:20730347 SZ:58334/62448/1,07
VC:291795 ENC:PLAIN_DICTIONARY,RLE
v:
.map:
..key:               BINARY LZO DO:0 FPO:20788681 SZ:1072311/2977966/2,78
VC:2674014 ENC:PLAIN_DICTIONARY,RLE
..value:             BINARY LZO DO:0 FPO:21860992 SZ:6997331/24721192/3,53
VC:2674014 ENC:PLAIN_DICTIONARY,PLAIN,RLE


2015-04-03 18:22 GMT+02:00 Eugen Cepoi <cepoi.eu...@gmail.com>:

Hey Ryan,

2015-04-03 18:00 GMT+02:00 Ryan Blue <b...@cloudera.com>:

On 04/02/2015 07:38 AM, Eugen Cepoi wrote:

Hi there,

I was testing parquet with thrift to see if there would be an
interesting performance gain compared to using just thrift. But in my
test I found that just using plain thrift with lzo compression was
faster.


This doesn't surprise me too much because of how the Thrift object model
works. (At least, assuming I understand it right. Feel free to correct me.)

Thrift wants to read and write using the TProtocol, which provides a
layer like Parquet's Converters that is an intermediary between the object
model and underlying encodings. Parquet implements TProtocol by building a
list of the method calls a record will make to read or write itself, then
allowing the record to read that list. I think this has the potential to
slow down reading and writing.


It's on my todo list to try to get this working using avro-thrift, which
sets the fields directly.



Yes I find logic the double "ser/de" overhead, but was not expecting such
a big difference.
I didn't read the code doing the conversion, but with thrift we can
directly set the fields, at least if what you mean is setting without
reflection.
So basically one can just create an "empty" instance via the default ctr
and reflection and then use setFieldValue method with the corresponding
_Field (an enum) and value. We can even reuse those instances.
I think this would perform better than using avro-thrift that adds another
layer. If you can point me to the code of interest I can maybe be of some
help :)

Does the impl based on avro perform much better?



That's just to see if it might be faster constructing the records
directly, since we rely on TProtocol to make both thrift and scrooge
objects work.

  I used a small EMR cluster with 2 m3.xlarge cores.
The sampled input has 9 million records about 1g (on S3) with ~20 fields
and some nested structures and maps. I just do a count on it.
I tried playing with different tuning options but none seemed to really
improve things (the pic shows some global metrics for the different
options).

I also tried with a larger sample about a couple of gigs (output once
compressed), but I had similar results.


Could you post the results of `parquet-tools meta`? I'd like to see what
your column layout looks like (the final column chunk sizes).

If your data ends up with only a column or two dominating the row group
and you always select those columns, then you probably wouldn't see an
improvement. You need at least one "big" column chunk that you're ignoring.


I'll provide those shortly. BTW I had some warnings indicating that it
couldn't skip row groups due to predicates or something like this. I'll try
to provide it too.


Also, what compression did you use for the Parquet files?


Lzo, it is also the one I am using for the raw thrift data.

Thank you!
Eugen




  In the end the only situation I can see where it can perform
significantly better is when reading few columns from a dataset that has
a large number of columns. But as the schemas are hand written I don't
imagine having data structures with hundreds of columns.


I think we'll know more from taking a look at the row groups and column
chunk sizes.


  I am wondering if I am doing something wrong (esp. due to the large
difference between plain thrift and parquet+thrift) or if the used
dataset isn't a good fit for parquet?

Thanks!


Cheers,
Eugen


rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.






--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to