[GitHub] [arrow] emkornfield commented on pull request #7175: ARROW-8794: [C++] Expand performance coverage of parquet to arrow reading

GitBox Mon, 18 May 2020 21:18:18 -0700


emkornfield commented on pull request #7175:
URL: https://github.com/apache/arrow/pull/7175#issuecomment-630568952



   > BM_ReadColumn<true,Int32Type> reflects a lot the profile I get with 
real-life dataset (nyc taxi dataset). If this can guide you in further 
performance validation.
   
   I don't think I'm going to be doing much more performance related work past 
https://github.com/apache/arrow/pull/7143 (which if you don't mind trying out 
it would be good to see if that improves performance on real world data).  The 
last potential easy performance win is pushing the all null/no nulls remaining 
checks directly into the loops (for small batch sizes I wouldn't expect a huge 
difference there).  My main goal is to get full nested functionality working, 
and I got a little distracted   
   
   
   
   Other changes will probably require a bigger refactoring then I want to take 
on right now.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] emkornfield commented on pull request #7175: ARROW-8794: [C++] Expand performance coverage of parquet to arrow reading

Reply via email to