Hi Impala folks,

I was wondering if there was any Impala work required to integrate with
HDFS erasure coding (planned for release in Hadoop 3, already available in
alpha form in 3.0.0-alpha1). I know that Impala tries to localize to nodes
and disks. With EC though, most reads will be remote, so locality isn't
important.

Is Impala scheduling going to work out-of-the-box?

Another idea is to implement a stride-aware data format, which re-enables
locality even for striped blocks. It's not clear if this is important
though, since EC is meant for cold data that isn't queried often.

Thanks,
Andrew

Reply via email to