Hi Impala folks, I was wondering if there was any Impala work required to integrate with HDFS erasure coding (planned for release in Hadoop 3, already available in alpha form in 3.0.0-alpha1). I know that Impala tries to localize to nodes and disks. With EC though, most reads will be remote, so locality isn't important.
Is Impala scheduling going to work out-of-the-box? Another idea is to implement a stride-aware data format, which re-enables locality even for striped blocks. It's not clear if this is important though, since EC is meant for cold data that isn't queried often. Thanks, Andrew
