GitHub user traflm opened a pull request:
https://github.com/apache/incubator-trafodion/pull/441
[TRAFODION-1920] suppress SQL error during HIVE_SCAN when encounter iâ¦
â¦nvalid value, assign null to the invalid value
This pull request try to solve the issue when Trafodion try to retrieve
data from a HDFS text file which contains invalid data. Like:
space for a numeric value
non-numeric string for a numeric value
this change will use the HIVE_SCAN_SPECAIL_MODE '2' , when that CQD is set
to 2, HIVE_SCAN will automaticlly convert the invalid data into null and
continue, instead of error out immediately.
How it does?
HIVE_SCAN need to do the data type conversion via expression evaluation.
There are two ways to evaluate an expression: pCode and eval() function. Since
the current HIVE_SCAN do conversion row by row, and details are encoded in
corresponding pCode. So the simplest way I can find out is to fix the invalid
data by set it to null and retry the whole evaluation.
In pCode and eval() function, right after a failure, it saves the offset of
the offending source data pointer. In HIVE_SCAN work() function, once it got an
evaluation error, find and fix the offending data in form of "R2" , exploded
format, and retry the conversion. The maxium retry time is the number of
columns.
So this change will only change pCode and normal evaluation a little bit.
After an error, try to remember the offset of offending source data. It will
not affect any normal evaluation/execution.
Only in HIVE_SCAN, in a special mode controlled by CQD, it will retry after
the failure.
This will not introduce any unknown side effort.
It is easier and maybe faster to change pCode execution to set the target
as null and continue, but since pCode is used in many cases, not only HIVE_SCAN
move experession. So that change will introduce unknown side effect.
This retry logic does introduce overhead, but in normal case the code path
is same as before, in special mode, if no invalid data, it will also go through
same path as before, just retry in case there is converting error.
This change only fix from HIVE_SCAN perspective, to fully support
transparent Bulkloader, we still need to change the bulkloader work() method to
tolerate other source of failure. This change will make sure there will be no
failure during HIVE_SCAN. Later, if there is CAST, or TRANSLATE or TRUNCATE
issues, we still need to change bulkload code.
So this is the first part to fully support TRAFODION-1920, but it should
handle a lot of real cases already.
Add some sample test cases in hive/TEST005 also, interesting reviewers can
check the test for more details.
Please help to review, and this will be a help in some real use cases.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/traflm/incubator-trafodion TRAFODION-1912
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/441.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #441
----
commit 8f78b446e9002ff9cd5e0f3ca2b5feead9726859
Author: Liu Ming <[email protected]>
Date: 2016-04-19T03:02:16Z
[TRAFODION-1920] suppress SQL error during HIVE_SCAN when encounter invalid
value, assign null to the invalid value
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---