[ 
https://issues.apache.org/jira/browse/HAWQ-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-1075:
----------------------------
    Fix Version/s: 2.0.1.0-incubating

> Restore default behavior of client side(PXF) checksum validation when reading 
> blocks from HDFS
> ----------------------------------------------------------------------------------------------
>
>                 Key: HAWQ-1075
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1075
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: PXF
>            Reporter: Shivram Mani
>            Assignee: Shivram Mani
>             Fix For: 2.0.1.0-incubating
>
>
> Currently HdfsTextSimple profile which is the optimized PXF profile to read 
> Text/CSV uses ChunkRecordReader to read chunks of records (as opposed to 
> individual records). Here dfs.client.read.shortcircuit.skip.checksum is 
> explicitly set to true to avoid incurring any delays with checksum check 
> while opening/reading the file/block. 
> Background Information:
> PXF uses a 2 stage process to access HDFS data. 
> Stage 1, it fetches all the target blocks for the given file (along with 
> replica information). 
> Stage 2 (after HAWQ prepares an optimized access plan based on locality), PXF 
> agents reads the blocks in parallel.
> In almost all scenarios hadoop internally catches block corruption issues and 
> such blocks are never returned to any client requesting for block locations 
> (Stage 1). In certain scenarios such as a block corruption without change in 
> size, Stage1 can still return the location of the corrupted block as well, 
> and hence Stage 2 will need to perform an additional checksum check.
> With client side checksum check on read (default behavior), we are resilient 
> to such checksum errors on read as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to