[ 
https://issues.apache.org/jira/browse/IMPALA-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vuk Ercegovac closed IMPALA-5931.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0
                   Impala 2.13.0

> Don't synthesize block metadata in the catalog for S3/ADLS
> ----------------------------------------------------------
>
>                 Key: IMPALA-5931
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5931
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Dan Hecht
>            Assignee: Vuk Ercegovac
>            Priority: Major
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Today, the catalog synthesizes block metadata for S3/ADLS by just breaking up 
> splittable files into "blocks" with the FileSystem's default block size. 
> Rather than carrying these blocks around in the catalog and distributing them 
> to all impalad's, we might as well generate the scan ranges on-the-fly during 
> planning. That would save the memory and network bandwidth of blocks.
> That does mean that the planner will have to instantiate and call the 
> filesystem to get the default block size, but for these FileSystem's, that's 
> just a matter of reading the config.
> Perhaps the same can be done for HDFS erasure coding, though that depends on 
> what a block location actually means in that context and whether they contain 
> useful info.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to