wpleonardo commented on PR #1375:
URL: https://github.com/apache/orc/pull/1375#issuecomment-1755416934

   > @wpleonardo Do we have any performance benchmark about this PR? 
@alexey-milovidov Maybe you are interested in it.
   > 
   > I try to use this feature in 
clickhouse(https://github.com/clickHouse/ClickHouse), but can't see any 
performance improvement.
   > 
   > Q: `select * from 
file('/data1/clickhouse_official/data/user_files/test.orc') format Null;`
   > 
   > With AVX512:
   > 
   > ```
   > 0 rows in set. Elapsed: 3.659 sec. Processed 1.13 million rows, 486.19 MB 
(308.68 thousand rows/s., 132.88 MB/s.)
   > 0 rows in set. Elapsed: 3.653 sec. Processed 1.20 million rows, 517.87 MB 
(329.40 thousand rows/s., 141.76 MB/s.)
   > 0 rows in set. Elapsed: 3.719 sec. Processed 1.13 million rows, 486.19 MB 
(303.70 thousand rows/s., 130.74 MB/s.)
   > ```
   > 
   > Without AVX512
   > 
   > ```
   > 0 rows in set. Elapsed: 3.565 sec. Processed 1.13 million rows, 486.19 MB 
(316.81 thousand rows/s., 136.38 MB/s.)
   > 0 rows in set. Elapsed: 3.540 sec. Processed 1.20 million rows, 517.87 MB 
(339.91 thousand rows/s., 146.28 MB/s.)
   > 0 rows in set. Elapsed: 3.681 sec. Processed 1.20 million rows, 517.87 MB 
(326.90 thousand rows/s., 140.69 MB/s.)
   > ```
   > 
   > About the test orc file:
   > 
   > ```
   > $ du -sh test.orc                                                     
   > 505M       test.orc
   > 
   > 
   > $ orc-metadata ./test.orc                           
   > { "name": "./test.orc",
   >   "type": 
"struct<reporttime:bigint,appid:bigint,uid:bigint,platform:int,nettype:int,clientversioncode:bigint,sdkversioncode:bigint,statid:string,statversion:int,countrycode:string,language:string,model:string,osversion:string,channel:string,heartcount:int,msgcount:int,giftcount:int,barragecount:int,gid:string,entrytype:int,prefetchedms:int,linkdstate:int,networkavailable:int,starttimestamp:bigint,sessionlogints:int,medialogints:int,sdkboundts:int,msconnectedts:int,vsconnectedts:int,firstiframets:int,ownerstatus:int,stopreason:int,totaltime:int,cpuusageavg:int,memusageavg:int,backgroundtotal:bigint,foregroundtotal:bigint,firstvideopackts:int,firstvoicerecvts:int,firstvoiceplayts:int,firstiframeassemblets:int,uiinitts:int,uiloadedts:int,uiappearedts:int,setvideoviewts:int,blurviewdimissts:int,preparesdkinqueuets:int,preparesdkexects:int,startsdkinqueuets:int,startsdkexects:int,sdkjoinchannelinqueuets:int,sdkjoinchannelexects:int,lastsdkleavechannelinqueuets:int,lastsdkleavechanne
 
lexects:int,unused_1:int,unused_2:int,setvideoviewinqueuets:int,setvideoviewexects:int,livetype:int,audiostatus:int,firstiframesize:bigint,firstiframedecodetime:bigint,extras:bigint,entrancetype:int,entrancemode:int,mclientip:bigint,mnc:bigint,mcc:bigint,vsipsuccess:bigint,msipsuccess:bigint,vsipfail:bigint,msipfail:bigint,mediaflag:bigint,dispatchid:string,proxyflag:int,redirectcount:int,directorrescode:int,subentrancetab:string,logininfolist:array<struct<strategy:bigint,ip:bigint,loginStat:bigint,reserve1:bigint,reserve2:bigint>>,playcentertype:int,videomutetype:bigint,owneruid:bigint,extra:string>",
   >   "rows": 1203317,
   >   "stripe count": 12,
   >   "format": "0.12", "writer version": "future - 9",
   >   "compression": "snappy", "compression block": 65536,
   >   "file length": 529207118,
   >   "content": 529182229, "stripe stats": 21150, "footer": 3712, 
"postscript": 26,
   >   "row index stride": 10000,
   >   "user metadata": {
   >     "org.apache.spark.version": "3.3.2"
   >   },
   >   "stripes": [
   >     { "stripe": 0, "rows": 117760,
   >       "offset": 3, "length": 50876922,
   >       "index": 23728, "data": 50851823, "footer": 1371
   >     },
   >     { "stripe": 1, "rows": 117760,
   >       "offset": 50876925, "length": 50948680,
   >       "index": 23679, "data": 50923619, "footer": 1382
   >     },
   >     { "stripe": 2, "rows": 62050,
   >       "offset": 101825605, "length": 26902880,
   >       "index": 15322, "data": 26886211, "footer": 1347
   >     },
   >     { "stripe": 3, "rows": 117760,
   >       "offset": 128728485, "length": 50474083,
   >       "index": 24110, "data": 50448601, "footer": 1372
   >     },
   >     { "stripe": 4, "rows": 117760,
   >       "offset": 179202568, "length": 50413042,
   >       "index": 23858, "data": 50387825, "footer": 1359
   >     },
   >     { "stripe": 5, "rows": 63570,
   >       "offset": 229615610, "length": 27504277,
   >       "index": 14890, "data": 27488029, "footer": 1358
   >     },
   >     { "stripe": 6, "rows": 117760,
   >       "offset": 268435456, "length": 50981984,
   >       "index": 24191, "data": 50956424, "footer": 1369
   >     },
   >     { "stripe": 7, "rows": 117760,
   >       "offset": 319417440, "length": 51017894,
   >       "index": 23792, "data": 50992731, "footer": 1371
   >     },
   >     { "stripe": 8, "rows": 61720,
   >       "offset": 370435334, "length": 26840720,
   >       "index": 15246, "data": 26824109, "footer": 1365
   >     },
   >     { "stripe": 9, "rows": 117760,
   >       "offset": 397276054, "length": 49971095,
   >       "index": 23487, "data": 49946233, "footer": 1375
   >     },
   >     { "stripe": 10, "rows": 117760,
   >       "offset": 447247149, "length": 50259825,
   >       "index": 24090, "data": 50234369, "footer": 1366
   >     },
   >     { "stripe": 11, "rows": 73897,
   >       "offset": 497506974, "length": 31675255,
   >       "index": 16948, "data": 31656952, "footer": 1355
   >     }
   >   ]
   > }
   > ```
   
   Yes, we have the performance micro-benchmark for this PR. If you use the ORC 
default align fixed bit width, AVX512 bit-unpacking has almost the same 
performance as non-AVX512. But if you use the ORC not align bit width, AVX512 
bit-unpacking has almost 6X performance gain compared with non-AVX512, and 
performance close to non-AVX512 with aligned fixed bit-width.
   So, maybe you could check the Clickhouse ORC setting if aligned bit-width or 
not.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to