[
https://issues.apache.org/jira/browse/HAWQ-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Goden Yao updated HAWQ-1075:
----------------------------
Fix Version/s: 2.0.1.0-incubating
> Restore default behavior of client side(PXF) checksum validation when reading
> blocks from HDFS
> ----------------------------------------------------------------------------------------------
>
> Key: HAWQ-1075
> URL: https://issues.apache.org/jira/browse/HAWQ-1075
> Project: Apache HAWQ
> Issue Type: Improvement
> Components: PXF
> Reporter: Shivram Mani
> Assignee: Shivram Mani
> Fix For: 2.0.1.0-incubating
>
>
> Currently HdfsTextSimple profile which is the optimized PXF profile to read
> Text/CSV uses ChunkRecordReader to read chunks of records (as opposed to
> individual records). Here dfs.client.read.shortcircuit.skip.checksum is
> explicitly set to true to avoid incurring any delays with checksum check
> while opening/reading the file/block.
> Background Information:
> PXF uses a 2 stage process to access HDFS data.
> Stage 1, it fetches all the target blocks for the given file (along with
> replica information).
> Stage 2 (after HAWQ prepares an optimized access plan based on locality), PXF
> agents reads the blocks in parallel.
> In almost all scenarios hadoop internally catches block corruption issues and
> such blocks are never returned to any client requesting for block locations
> (Stage 1). In certain scenarios such as a block corruption without change in
> size, Stage1 can still return the location of the corrupted block as well,
> and hence Stage 2 will need to perform an additional checksum check.
> With client side checksum check on read (default behavior), we are resilient
> to such checksum errors on read as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)