huaxiangsun commented on pull request #3675: URL: https://github.com/apache/hbase/pull/3675#issuecomment-918648373
> > By default, we will switch to stream after reading a small amount of data, several hundreds of KBs? If we read several hundreds of MBs of data in a map reduce job, I do not think it will effect the performance too much? > > For a standalone Java program reading a ~5G file in a single JVM (... using the mapreduce snapshot APIs), this change improved run time from 90s to 30s. In a distributed system, it only had about 15% improvement (network became the bottleneck -- that's where [HBASE-26274](https://issues.apache.org/jira/browse/HBASE-26274) came into play). The number is impressive. For standalone Java program, is it hdfs local read with short circuit read enabled or through local tcp connection? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
